Quake 2 on Falcon030

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Zamuel_a wrote: Very impressive that you got it to work. Do you filter the lightmap when you scale it or just combine it as it is? (sharp shadows wouldn't look so good).
There's a slight advantage on the Falcon in that the PC version has to do a table translation to get an approx 8-bit colour for the original 8-bit pixel combined with the light level on each surface pixel. This means the lit pixels are re-quantized and some error creeps into the shading/lighting.

On the Falcon you can translate directly from 8bit colour + light -> RGB16 so you get nice smooth lighting :) You can do this direct from the original 24bit game palette.

I was going to attempt colour lighting like the GL version, but I see now that it would only work for static lightmaps. It would be difficult to make it work with dynamic lights which themselves need precombined before combining with the texture. Still I'll probably do a static version to see how it looks as it will probably be nicer visually.

Yes the lightmaps are filtered while they are being combined. There's a specific filter routine for each miplevel. It's done as tiles so the routines are hardcoded to 2x2, 4x4, 8x8 and 16x16 versions, making them easier to optimize. It's still going to be faster not to filter but will see how it goes.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Another speedup is on the way. After fixing most of the correctness issues and getting a stable render at all z-distances, I tried dropping the span arithmetic from 48bit to 24bit effective, and got nearly the same result. So there will soon be a 6x reduction in the amount of code needed to set up each span, and a 50% reduction in data transmitted to DSP per face - which is nice :)

Some sparkles did creep back in when testing this but most can probably still be suppressed back out.

I've not done much else because I still have flu. Still a lot of optimization work needed to get a decent idea of final speed. i've increased the window size again to 256x128. The 2:1 ratio is mainly due to a bug in the projection code where the fine ratio causes overflow so I took it out until later.
User avatar
Mindthreat
Captain Atari
Captain Atari
Posts: 279
Joined: Tue Dec 16, 2014 4:39 am

Re: Quake 2 on Falcon030

Post by Mindthreat »

Anxiously looking forward to another video with all recent optimizations in place :D

Cheers! :cheers:
Atari-related YouTube Videos Here: - https://www.youtube.com/channel/UCh7vFY ... VqA/videos
Atari ramblings on Twitter Here: https://twitter.com/mindthreat
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I was thinking about doing another vid but the performance is still quite bad in two areas (surface cache, face setup) so I've decided to just keep working on it a bit longer until those have been improved. It runs well if you stand and look at some walls, but it still gets slow if you run about and/or look across a room.

I think the most difficult parts are done though, as far as i had planned anyway, for scenery - just a lot of rewriting and optimization work needed on C and floating point code.
User avatar
Scarlettkitten
Captain Atari
Captain Atari
Posts: 262
Joined: Thu Mar 19, 2009 11:42 am
Location: Northamptonshire, UK

Re: Quake 2 on Falcon030

Post by Scarlettkitten »

Keep it up Doug, I love reading about this 8)
My musical dribbles 🎶 https://sophie-rose.bandcamp.com
Mega ST4, 520STM.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Thanks!

I had a half-hearted go at block-replacing the last FPU code which is slowing things down, but it didn't work at all - just a big mess on the screen. So I'll leave it alone for now and try again at the weekend, doing it in smaller steps next time.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Still not recovered from flu, so not much happening code-wise.

Here is a block diagram showing the F030 engine - by now quite different from the PC version, and some necessary adaptations for the tiny memory model of the DSP.

The blue blocks are 68030-only (or data held in system RAM), the orange blocks DSP-only, and the green blocks are shared processes between the two.
Q2_F30 Engine Architecture.pdf
You do not have the required permissions to view the files attached to this post.
User avatar
Mindthreat
Captain Atari
Captain Atari
Posts: 279
Joined: Tue Dec 16, 2014 4:39 am

Re: Quake 2 on Falcon030

Post by Mindthreat »

dml wrote:Still not recovered from flu, so not much happening code-wise.

Here is a block diagram showing the F030 engine - by now quite different from the PC version, and some necessary adaptations for the tiny memory model of the DSP.

The blue blocks are 68030-only (or data held in system RAM), the orange blocks DSP-only, and the green blocks are shared processes between the two.
Q2_F30 Engine Architecture.pdf
Wow! :thumbs:
Atari-related YouTube Videos Here: - https://www.youtube.com/channel/UCh7vFY ... VqA/videos
Atari ramblings on Twitter Here: https://twitter.com/mindthreat
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I made a tiny bit of progress on removing the last chunk of FPU stuff from rendering.

Really there are two chunks left and the first one is being translated to 68k integer-clean code. The first attempt was a disaster and basically blew up, but last night I managed to get it moving in the right direction with textures approximately following their assignments on walls. It wobbles a lot but this problem can be resolved with some effort.

The main problem with my first attempt was me forgetting I had pre-formatted some of the model geometry to use 24bit fixedpoint for the sake of the DSP and then tried to mix this with 16/32 arithmetic in 68k without the necessary corrections.

Once I'm satsified that precision is sufficient I'll consider more carefully how to implement a final version and which chips should be involved.


The second block of FPU code will more likely just be moved directly onto the DSP, but will leave this until later. It will be too hard to debug if done ahead of time. Half of the problem with this stuff is doing things in the right order so problems can be identified and stamped out before making too many other changes.

Removing the FPU code alone probably won't provide a magic speedup on its own but will equalize the Hatari version with real HW (so far, real HW has been slower for scenery, but faster for pixel drawing), and make proper optimization possible, which so far has been a bottleneck for complex scenery.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Managed to fix the wobble and got it working 99% correctly with 25% of the remaining FPU code turned into plain 68k.

This did produce more sparkles so I'll probably have to work on it more before trying to convert the rest. But it looks promising.
grab0086.png
You do not have the required permissions to view the files attached to this post.
User avatar
Scarlettkitten
Captain Atari
Captain Atari
Posts: 262
Joined: Thu Mar 19, 2009 11:42 am
Location: Northamptonshire, UK

Re: Quake 2 on Falcon030

Post by Scarlettkitten »

That's an impressive number up top left. 8)
My musical dribbles 🎶 https://sophie-rose.bandcamp.com
Mega ST4, 520STM.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

:)

Gradually catching up with BM, but some distance still to cover.
User avatar
shoggoth
Nature
Nature
Posts: 1447
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden

Re: Quake 2 on Falcon030

Post by shoggoth »

Am I wrong to think this could fly on e.g. a CT2?
Ain't no space like PeP-space.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

shoggoth wrote:Am I wrong to think this could fly on e.g. a CT2?
I imagine it should, yes.

There is a constant-time limit on fillrate, but this is to be controlled by adjusting window size and 'chunky modes' on x/y... one of the benefits of maintaining zero overdraw - it restricts overall impact of framebuffer on performance. There are no other fixed limitations on speed - just the performance of the main chips and the ram itself.
Last edited by dml on Sat Feb 21, 2015 5:23 pm, edited 2 times in total.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

doubledoublepostpost
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Is there anyone out there with experience building maps for Q1, Q2 or Q3, and has some interest in this project? If so, let me know. PM or otherwise.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I found the reason for the ugly texture sparkles which crept in after converting the previous bit of FPU code.

The camera projection width:height ratios involve a scaling by (1.0/(width/2)), which in this case was 1/128, or a fixedpoint multiply by $0200 at a sensitive part of texture plane setup. Due to a 1-bit rounding error this was coming out at $01FF and pushing the texture UVs out quite a bit. This was easily fixed by rounding the terms properly before starting the render pass. There wasn't anything fundamentally wrong with the replacement of FPU code itself or precision required.

It's also clear now that a bunch of big expensive multiplies in this area can be collapsed into shifts, or cancelled out completely with normalization, if the window size is kept as 2^n (128 or 256 really). This is irrelevant on DSP but I make notes in case its better to parallel this with the CPU+cache during scan conversion.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

50% of the last FPU code is now fixedpoint 68k, with so far very little loss of precision. This accounts for all of the texture plane setup calculations which can be shared between faces.

The other 50% must be done per face and relates to mipmap scaling and conversion to screen i,j stepping deltas. This is all multiplies+adds, so I'm fairly sure it will convert ok.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

There was a bit more progress at the weekend.

100% of the FPU code is now gone from within the rendering code.

What remains is now a mixture of 68030 for 3D transforms & intersections on shared texture planes - this is still a bit expensive - and DSP, for the per-face setup of i,j stepping terms. It has got a bit faster because of these changes but it needs quite a lot of optimization work and some fixes to correct precision loss.

The 68030 part can probably also be DSP'd because it's ideally suited for that. It just need to be done very carefully in stages, like everything else so far.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

So two of the primary costs have pretty much reached their limit now on 16MHz F030.

Overhead for each textured span is now zero on the DSP side - invisible. Setup is all done in the time it takes the 68030 to loop back and begin a new span - itself a small number of instructions. There is still room to do a couple of things here on the 030 side but they are very minor.

Pixel filling is also very close to the limit for this @ 2 ops per pixel on the CPU. There is still a slightly faster way to do this on real HW but the difference is small and I don't see a practical way to make it work just yet so I'm going to stick with this version unless that situation changes.

I now get around 12fps best case at 256x128, when staring at a wall. 6-9fps is more typical for a basic room when the surface cache is filled. 4-5fps in dense hotspots. It can still get slower than this when the camera is moving and the surface cache gets invalidated.


So the two main bottlenecks remaining:

- Surface cache filling. A consequence of moving the camera so new content comes in view, where new (or more detailed) textures are needed for a change in proximity. I have a number of strategies to deal with this one, a few tried already and some waiting to be tried. But it also needs optimization work.

- Per-face setup still too expensive. Much better than it was but still at least three matrix->vector transforms per face to be absorbed, best moved to DSP.


Surface cache strategies:

Surface cache is filled in using a tile loop, where each tile is 2x2, 4x4, 8x8 or 16x16 depending on miplevel. The tiles perform box filtering on the lightmap while combining with texturemap. The tiles are optimized into 68k but the tile loop is still C. It needs converted and optimized properly.

One strategy is to just avoid filtering on the smallest 2x2 tilesize. This is useful because it is used much more often than the other sizes due to projection of sizes (distant objects are always more numerous than near ones), and filtering cost is higher relative to pixels written.

Another is to begin to back away from the just-in-time approach to preparing surfaces, which guarantees a surface is ready for drawing before using it. This usually involves a lengthy detour from the rendering loop, emptying the cpu cache over and over as it flip flops between tasks. One way to achieve this is to let the renderer use any available mip for that surface, if the desired mip is not ready - until it can be generated on a later frame. This makes textures appear to up-rez in a lazy manner over time as you move around.

Another strategy is to implement a threshold for surface pixels generated per frame, and 'throttle' when this threshold is hit. Throttling causes future mips to be adjusted smaller. Each successive tripping of the threshold causes the mip throttle to increase until it hits the smallest mipsize and all textures are generated with the lowest resolution until the cpu can catch up with the scene.

A better way is to build on these methods by never preparing a surface during rendering - only scheduling it for the next frame and using an impostor for the current frame (best available mip in the cache, or... flat-fill). This has the benefit that no detours are needed during rendering, and the threshold for pixel filling can become a true cap, since flat-fills need no surfaces to be generated at all. All surface cache filling can be done in a tight block before rendering, based on needs from the previous frame.

I like this approach because it allows a performance continuum which can be adjusted according to the speed of the machine using a single value.
User avatar
AdamK
Captain Atari
Captain Atari
Posts: 458
Joined: Wed Aug 21, 2013 8:44 am

Re: Quake 2 on Falcon030

Post by AdamK »

Dml, again, outstanding work.

Do you think, if all DSP stuff would be backported to m68k asm, would it have playable framerate on 060? I'm thinking about lucky CT6x owners. Maybe 060+DSP version might be possible?
Atari: FireBee, Falcon030 + CT60e + SuperVidel + SvEthlana, TT, 520ST + 4MB ST RAM + 8MB TT RAM + CosmosEx + SC1435, 1040STFM + UltraSatan + SM124, 1040STE 4MB ST RAM + 8MB TT RAM + CosmosEx + NetUSBee + SM144 + SC1224, 65XE + U1MB + VBXE + SIDE2, Jaguar, Lynx II, 2 x Portfolio (HPC-006)

Adam Klobukowski [adamklobukowski@gmail.com]
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

AdamK wrote: Do you think, if all DSP stuff would be backported to m68k asm, would it have playable framerate on 060? I'm thinking about lucky CT6x owners. Maybe 060+DSP version might be possible?
I expect the 060 could run a more 'normal' version of the original code without all the changes I made, plus some decent 060 optimization to the heavy areas. A lot of the changes I made were aimed at the much smaller 030 caches, and a big performance ratio difference between DSP and CPU. I'm sure some of the redesign work I did would help but not as much as it helped on 030.

The 060 version would probably run faster with subdivided spans (perspective correction every 16 pixels, like the PC) and affine interpolation of each subspan since the FPU is lightning quick, i-cache is huge so more complex code is ok in the rendering loop, and no DSP host port comms is needed at any time during a scan.

It's better to use the DSP and perform per-pixel correction on 030, mainly because I couldn't find any other sensible way to do it. Every other route costs much, much more. Even subdividing spans doesn't really offer a speed gain on the DSP, because it incurs lead-time/setup work for each span which ends up stalling the CPU, especially on very short spans. i.e. you might get a 5% speed increase when looking at a wall, and a 20% decrease when looking out of a window. Better to do perfect pixel correction on every pixel and get a flat cost and a tiny cost for short spans.

When I'm done with 030 I'll try to get it working on 060 (+50MHz DSP?) to see how badly the DSP gets in the way. It could end up working better than my guesswork here - only one way to find out.
User avatar
AdamK
Captain Atari
Captain Atari
Posts: 458
Joined: Wed Aug 21, 2013 8:44 am

Re: Quake 2 on Falcon030

Post by AdamK »

I think, the most common accelerated target is CT6x (so no acceleration on DSP), and then CT2b (030 at 50MHz ant DSP at ... 32MHz(?)). So I think I'd be best if you chose one or the other.
Atari: FireBee, Falcon030 + CT60e + SuperVidel + SvEthlana, TT, 520ST + 4MB ST RAM + 8MB TT RAM + CosmosEx + SC1435, 1040STFM + UltraSatan + SM124, 1040STE 4MB ST RAM + 8MB TT RAM + CosmosEx + NetUSBee + SM144 + SC1224, 65XE + U1MB + VBXE + SIDE2, Jaguar, Lynx II, 2 x Portfolio (HPC-006)

Adam Klobukowski [adamklobukowski@gmail.com]
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

AdamK wrote:I think, the most common accelerated target is CT6x (so no acceleration on DSP), and then CT2b (030 at 50MHz ant DSP at ... 32MHz(?)). So I think I'd be best if you chose one or the other.
It would be nice to see what people could manage if they targeted the Falcon/060, as if it was a bare demo machine, as things used to be on the ST :) I think the results would be worth the pain and effort.

There's a lot of nice stuff coming over from the PC and Amiga (and of course some Falcon-only work), but it would be nice to see more of the unique, targeted work in the future...
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

A new video will appear sometime this evening when I get free again.

Return to “680x0”