Eero Tamminen wrote:Finally had time to try this out and besides looking amazing, it's faster & more reactive to user input than I had hoped for. Works fine in latest Hatari from Mercurial (WinUAE CPU core build) with its 030 data cache emulation.
Eero Tamminen wrote:Below is slowest place I could find from ikdm4 map, normally speed was 7-14 FPS:
DrTypo wrote:I tested this public build on my Falcon. No lightmap weirdness, it runs fine!
dml wrote:I had some quite painful experiences with the earlier Doom project because the rendering part had been created before I got anywhere near ID's source, and worked quite differently from the original.
dml wrote:It's now at the point though where further progress would need to involve an offline step (or a lot of waiting at runtime! already the case with coloured lightmaps...).
calimero wrote:just to say hi to Axis!
- C64 demos that you worked on are AMAZING! Brilliant and mind blowing code
btw I did not know that you code on Amiga too...
AxisOxy wrote:I had similar issues with the doom renderer. I started it completely in assemby language. Not a good idea at all! It took ages to get a working version that halfway looked like the original Doom. Debugging was nearly impossible.
AxisOxy wrote:But unfortenately at about 1 fps on my target platform 68030/50Mhz. A profiling session showed, that the major problem was, that the old compiler in combination with 68030 is extremely bad on function calls. So first optimization steps: Replace functions with macros, remove the recursion. Still having a version that works in Visual Studio is very nice for debugging reasons.
AxisOxy wrote:I'm pretty pleased with my approach of offline converting the maps. It massively helps on loading times, because some of the processes (E.g. calculating shading-/dithering-tables for colored lightmaps or the minimum fit bounding spheres) take really long to execute. Even a second or two on a PC. And thats not a good sign. Another nice benefit of this approach is, I dont need multiple loaders for importing different engines (Quake 3, Half-Life). Not that I?m taking any advantage of this, yet. But you never know.
AxisOxy wrote:When I see your algorithm descriptions, I have to admit that I didnt expect the architecture of the 68030/DSP having such a big influence on the algorithmic choices. The methods are really extreme, but totally make sense under these conditions. Partially they remind me of the good old days, when we just rotated wireframe cubes.
AxisOxy wrote:I'm not sure if this is what you are doing. But it sounds pretty promising to just collect the data in the BSP-recursion. And then do it oldschool: transform all vertices -> cull all backfaces -> render all remaining faces. This would also make life much easier when porting parts of the code from C to ASM, because the communication protocol is much simpler. Like here are 1000 vertices in a linear array, transform them!
AxisOxy wrote:When I start to think about it, this would also be a very good structure for an engine on modern hardware, like PC or Game-consoles. This would make it much easier to parallelize work (SIMD, threads, GPU-programming).
AxisOxy wrote:Lately I?ve done some stuff for A500 OCS machines (The real Amiga). Our latest release there is Planet Rocklobster.
AxisOxy wrote:Another thing on my todo-list is an Atari ST demo. I have ported a hand ful of A500 effects for the ST and have a lot of good friends in the atari scene. So there?s something to come there, too. Someday!
AxisOxy wrote:Always the same problem, so many ideas for projects and so little time to do them.
AxisOxy wrote:Yeah, weather here in northern germany is also awesome. So, I´m actually less coding and more BBQ´ing. ;o)
AxisOxy wrote:But I need to code something in the next days. Just got a great idea for speeding up the perspective texture mapping. That urges to get tested. Its mostly based on rendering the polys in 8x8 blocks and do the expensive work for the perspective correction only on an 8x8 grid for the area the polygon covers. Not sure how this looks quality-wise compared with 16 pixel spans. I also have no clue how to integrate that with the coverage buffer. Perhaps I try to handle the coverage also based on 8x8 blocks. Allowing a little bit of overdraw along the edges. We will see.
AxisOxy wrote:Interesting to see your benchmark. I´m wondering a bit why there´s so much time spent in _R_GeomReindexBatch, _R_GeomSubmitBatch and _R_CommitFaceGeometry. My first educated guess goes to datatransfer to and from the DSP.
AxisOxy wrote:I attached a bechmark of my engine. Not as detailed as yours, but it shows the bottlenecks.
It uses the worst case frame of "Outer Base"/Quake 2 on Amiga 1200/Blizzard 1230@50 Mhz with flatshaded polys.
I normally use flatshading for benchmarks, because it shows the problems more obvious. Showing a frame with a fullscreen wall or floor is easy to do fast. But a frame with 500+ faces is a total different story.
AxisOxy wrote:A few posts ago, you mentioned that you did optimizations for the 68030 data-cache. So I guess the data-cache of the Falcon 030 actually works. Amiga 1200 68030-Boards all suffer the same issue. The data cache doesnt work at all. Or better said: It works (causes problems with SMC), but theres not a single cycle it saves. Darn it!
dml wrote:and sidesteps the need to distribute 'copyrighted stuff' in preconverted form
dml wrote:That seems like a worthy experiment. Looking beyond the obvious coding difficulty, it probably will produce decent visual results and some gain at the same time.
dml wrote:Long ago I tried an adaptive (subdivision) approach to horizontal spans (again with DSP) but quickly found that it did not map well to perspective curves - it either required an unreasonable amount of subdivision (small error tolerance), which defeated the gains, or caused visual popping which was distracting.
dml wrote:It is safe to say that it does suck on the Falcon, but for obvious reasons - word fetches become longword fetches, over a 16bit bus.
dml wrote:One other thing I forgot to mention is that I introduced a 5th clipping plane (nearclip), which isn't present in Q2. At first this just added to overall cost, but after many attempts at optimizing the BSP routines it began to save more than it cost.
Code: Select all
;Address an interleaved int* linebuffer. Can be done for free with (a1,d3.l*8)
;Looks like compilers dont know about 16 or even 32 bit.
;WTF! Didnt know, that this even works.
AxisOxy wrote:I started to port more code from C to ASM and SAS/C starts to impress me.
AxisOxy wrote:Oh yeah, thats a problem I havent really thought about. No problem, yet. Because I am a gazillion lightyears far from a release, but this needs some thought in the future.
AxisOxy wrote:Did some fast tests. The texturemapping itself works quite well (quality- and speed-wise) compared to the span based approach. But I still have no idea how to solve the coverage problem, because the block-wise rendering is totally incompatible to the typical coverage buffer implementations. So, at the moments its moved back to the stack of ideas and prototypes until I get the right inspiration.
AxisOxy wrote:Yeah, I did the same back in the 90´s, for a PC 3D-engine (FPU+MMX), when I worked for Software 2000. I know that popping and wiggling when you use this kind of tech on typical Doom-/Quake-style maps. But for our usecase (A soccer game) it worked pretty well (A big grass surface in the right angle and everything else on the screen has rather small polygons or is pretty far away).
AxisOxy wrote:Uh, that sounds really bad. So your data cache even slows things down. *LOL*
AxisOxy wrote:After a little bit of research I found the reason for the data cache problem on Amiga. It looks, like it works only in write-allocate mode, due to the way the data-bus on Amiga works. I did a bit of testing and found some edge cases where it saves 1 cycle. But these cases are so rare, that I´m pretty sure they will never happen in real world scenarios. And definitely not by accident. So, lets take the Amiga and Falcon hardware-designers, put them in a bag und use a big club on them. It wont hit anyone wrong.
AxisOxy wrote:Ah, yes. I also clip against the nearplane. I mostly added that for accuracy reasons. Without that, I got only the choice between clipping bugs on very big faces or ugly jittering in the subpixel accurate edges, due to extremely big values in the edge-setup. In the end, it even saved some cycles. While we are at it, I only clip the nearplane in 3D. The other 4 clip-planes are only used for frustum-culling. And the x-/y-clipping is done in 2d during the edge-setup (in y) and rasterization (in x).
AxisOxy wrote:;WTF! Didnt know, that this even works.
Code: Select all
[x=realvalue] [y=expected] [y=actual] [error >= 0.01%]
r:0.4277 ye:0.6540 ya:0.653809 e:0.02%
r:0.0586 ye:0.2421 ya:0.241943 e:0.06%
r:0.1701 ye:0.4124 ya:0.412109 e:0.08%
r:0.2395 ye:0.4893 ya:0.489258 e:0.02%
r:0.3707 ye:0.6088 ya:0.608398 e:0.07%
r:0.3865 ye:0.6217 ya:0.621826 e:0.02%
r:0.1175 ye:0.3427 ya:0.342529 e:0.06%
r:0.3985 ye:0.6313 ya:0.631104 e:0.03%
r:0.4556 ye:0.6750 ya:0.675293 e:0.04%
r:0.3674 ye:0.6061 ya:0.605957 e:0.03%
r:0.2812 ye:0.5302 ya:0.530029 e:0.04%
r:0.1479 ye:0.3846 ya:0.384521 e:0.01%
r:0.3074 ye:0.5544 ya:0.554199 e:0.04%
r:0.0485 ye:0.2203 ya:0.219971 e:0.16%
r:0.4198 ye:0.6479 ya:0.647705 e:0.03%
r:0.0109 ye:0.1045 ya:0.104248 e:0.19%
r:0.4058 ye:0.6370 ya:0.636475 e:0.08%
r:0.1180 ye:0.3434 ya:0.343262 e:0.05%
r:0.3377 ye:0.5812 ya:0.580811 e:0.06%
r:0.2195 ye:0.4685 ya:0.468262 e:0.05%
r:0.3707 ye:0.6088 ya:0.608398 e:0.07%
r:0.0187 ye:0.1369 ya:0.136719 e:0.12%
dml wrote:So the other thing I had been working on is an alternate way to perform square-root operations for realtime 3D.
After some experiments I developed a solution which closely approximates a 23bit fixedpoint sqrt() in just 10 cycles.
alexh wrote:Presumably these methods are similar to the 5 DSP-instructions per bit routine you started with?
Code: Select all
sqrt macro xysqr,xyroot,Txy
tfr b,a #<0,xyroot ; : pattern-accumulator
lsr b a,Txy ; shift trial bit : new trial pattern
mpy Txy,Txy,a ; trial (x*x)
cmp xysqr,a xyroot,a ; (x*x)>a? : restore pattern-acc for update
tle Txy,a ; condition update pattern-acc
add b,a a,xyroot ; combine bit : save updated pattern-acc
dml wrote:While it is nearly always possible to beat the compiled code - sometimes trivial to do so - at other times actually quite difficult if the code is even slightly convoluted.
dml wrote:I didn't realize this was going on alongside my effort on the Atari.
dml wrote:Although I ran into problems when I found there were nonconvex polys in the maps and broke the 2D clip exit/entry ordering logic.
Users browsing this forum: No registered users and 3 guests