Quake 2 on Falcon030

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

AdamK wrote:
dml wrote:- rework code to minimize the duration of time spent using CPU+DSP at the same time - make those blocks of time as narrow as possible
Why? Isn't processing in parallel faster?
Good question. Yes it is - but there are 2 levels of parallelism involved. Fine-grain and coarse-grain parallelism. I'm looking for opportunities to use both kinds - currently focusing on the coarse-grain type. Fine-grain is already being used in most places where it seems to fit.


1) Fine-grain parallelism is where the two chips communicate/synchronize continuously on the same single task and neither is free to be used for a different task until the task is done. The BSP algorithm works like this. This method works ok if both chips have a roughly equal balance of work and one side doesn't cause the other to wait a lot. It is inefficient if a lot of exchanges are required between the two sides and/or if one side carries most of the load.

Usually in practice it involves the DSP stopping/starting and idling frequently in many small bursts and it's usually pretty complicated to optimize. Most of the code works this way where it can but not everything can use it sensibly. e.g. scan converting polygons doesn't benefit from the CPU at all and reindexing big geometry in main memory can't use the DSP.

So there are hybrid cases which can use both chips on the one task, and other cases which are CPU-only or DSP-only tasks.

2) Coarse parallelism is where you take the remaining CPU-only and DSP-only tasks and try to run them at the same time, overlapping them as much as possible, without breaking anything.

What I meant by making the CPU+DSP cases as narrow as possible, is lifting out from those cases any code which is CPU- or DSP-only, and taking advantage of coarse-parallelism for those bits of code, executing them as separate routines. It doesn't compromise fine-grain parallelism at all, since it only affects pieces of code which isn't already working in a parallel way.

I built an excel sheet showing the approximate time spent in each task and which resources are used by each, to help with the overlapping of parallel work. I don't have time to complete it yet - but will be posted later. :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Here is an approximate time-map of the reworked Q2 engine implementation for Falcon. This is for the untextured version. The textured version is more complicated in a couple of areas but doesn't change the approach at all.

Having a time-map helps to plan for parallelism opportunities, and highlights areas worth reworking in ways that could otherwise be very non-obvious.

Note: While it can be derived from the source alone, I confirmed the time-map using the Hatari profiler. The profiler % breakdown provides the approx times spent in each place, and a symbol trace from a pass over one frame indicates the order things are actually executed in. I also had to add some deliberate spoilers to the code, to force parallelism to stop so the time periods could be measured properly. I can't overstate the importance of having a tool or contrived test case for confirmation because a small mistake or faulty assumption will easily spoil parallelism without telling you. It's easy to assume you have optimized something when you haven't really ;)
time-map.png
Green is CPU-only, blue is DSP-only and orange is using both at once (fine-grained parallelism).

The only significant coarse-grain parallelism happening just now is between _R_ReindexFaceVertices_HeadCPU and _R_SubmitFaceGeometry_TailDSP, since they each use different hardware resources and the block they are executed in is looped several times so they naturally happen one after the other. This overlap saves about 10-15% time. Even this was only possible by removing a tiny DSP+CPU batch initialization event which used to sit between them an spoiled parallelism.

However looking at the map I'd speculate on some other useful changes:

1) Try to delay _R_XFormProjectVertices_TailDSP by one execution stage, so it happens just before _R_SubmitFaceGeometry_TailDSP. Joining these two results in a single large DSP-only time block which stands more chance of overlapping with another CPU task (currently that would only be _R_ReindexFaceVertices_HeadCPU).

2) Rework _R_ProcessBSPVisitQueue_HeadCPU so it is done as part of the batching process, just before _R_ReindexFaceVertices_HeadCPU. This will let it overlap with the DSP blocks mentioned in (1) R_MarkLeaves might also be moved but this seems both less practical and less useful.

3) Find the surface batch size which gets the best overlap without leaving a long tail to process in the last batch (i.e. not too big) - but without adding CPU-side cost from cache misses etc. (not too small!).

As mentioned already - all of this work is about parallelism. It's separate from actually optimizing the code - some of which still needs done anyway.
You do not have the required permissions to view the files attached to this post.
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Quake 2 on Falcon030

Post by calimero »

I have offtopic question:

since you spend so much time at reimplementing Doom and Quake (and in process accumulated knowledge :)) can you see any benefit in terms of FPS (frame per second) if 3D engine would be specific designed for Falcon hardware from scratch?

or maybe more beneficial would be levels that are specifically design for Falcon hardware in your current Doom / Quake engine?
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

calimero wrote:I have offtopic question:

since you spend so much time at reimplementing Doom and Quake (and in process accumulated knowledge :)) can you see any benefit in terms of FPS (frame per second) if 3D engine would be specific designed for Falcon hardware from scratch?

or maybe more beneficial would be levels that are specifically design for Falcon hardware in your current Doom / Quake engine?
Yes, it's always beneficial to start from scratch with the platform design in mind, maybe a few times over :) And yes levels can be designed to be more suited for the platform but the tech is always a bit 'wrong' for it from the start.

e.g. I can really speed up the flatshaded version by welding plane-sharing faces together into 'super faces', which are normally kept separate for the sake of textures and lightmaps. But doing this is pointless if texturing is going to be used (which it is).

The main problem with creating an engine from scratch is lack of data and tools. Been there and done that. Takes a lot of effort to make toolchains. Reworking an existing technology lets you use a lot of well developed tools.

Not that I'm against doing stuff from scratch - I'm hoping to do something like that next anyway, but taking such a path will have to avoid the need for big data.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I took 3 grabs recently to compare performance in a problematic location.

1) From one of the recent YT videos, using older code
qvid.png
2) From Hatari 1.8 using latest code
qh18.png
3) On real F030 using latest code
qf030.png
I picked one of the slowest locations in the vid, and one that hasn't shown much improvement over many revisions - but its finally showing a decent speedup now with more recent changes. It's also showing a healthier improvement on real HW after some of the d-cache optimizations to the BSP stuff.

It's still a slog, going through all the code and rewriting each bit one at a time. I'm about half way through the DSP side of the geometry code now, and still a lot to do. Will see what more can be done with it this evening when I get free.
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

After last night's DSP optimizations the FPS on real F030 at this location rose from 5.33 to 5.44. Not a big change but at least still going the right direction :)

The time-map has been reordered a bit (as explained previously) to give more opportunities for parallelism but they haven't all been explored yet.
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Quake 2 on Falcon030

Post by calimero »

so 20% is visible upgrade :)
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

FPS on 'fatal1' startpoint rose from 12 to over 16fps.

I'm going to keep hacking at this until I get completely stuck for ways to speed it up in TC mode, and then will switch back to texturing performance.
You do not have the required permissions to view the files attached to this post.
kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Quake 2 on Falcon030

Post by kristjanga »

NICE :D
User avatar
Scarlettkitten
Captain Atari
Captain Atari
Posts: 262
Joined: Thu Mar 19, 2009 11:42 am
Location: Northamptonshire, UK

Re: Quake 2 on Falcon030

Post by Scarlettkitten »

I can't believe the fps you're getting, superb work :cheers:
My musical dribbles 🎶 https://sophie-rose.bandcamp.com
Mega ST4, 520STM.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

:cheers:

Finished rewriting the clipper last night, properly optimized now. Result is a small increase in peak speed but flatter performance everywhere - less variation in FPS when viewing in different directions.

Small speedup in the problem area I've been tracking (5.53fps) - but I think this one is really CPU-bound and not much affected by recent DSP optimizations. Will need to re-analyze everything again to see what should be done next.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I'll probably do a new video next, as most of the important geometry optimizations are done. A few remain but I'll deal with them later. Also a welcome break from staring at the same code :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I had started recording material for an updated vid and then noticed another, perhaps important optimization. So I'll probably try that first and repeat the recordings.

I also notice that I'm sometimes drawing too much stuff compared with the original engine, because the portal test for connected rooms is not being done - it always draws through doorways. Which looks ok since doors are currently invisible - but it's more drawing than intended by the map in some places. In Q2 some of those doors will be shut when not near them, and the portals turned 'off' with drawing stopped at the door.

In fact areas which use portals to divide the map will be significantly affected because they were placed deliberately in the map, and probably for good reason (complex inside + outside areas, separated by a door, doubling the polycount when viewed at a distance with the portal open!).

I'll probably have to fix this to get more favourable comparisons with Q2 in some maps, but will leave it until later.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

That last optimization was quite effective - creating the time map and moving things around exposed a few opportunities and one of them allowed 2 routines to be collapsed into 1, removing 40% of the work in that area.

The fatal1 startpoint is close to 17fps now in Hatari (over 17 on F030), and some more dense areas see a bigger improvement.

But I now see another optimization on the DSP side affecting BSP traversal. This one could also be valuable.

Those aside (and a few minor ones not mentioned), I think I'm probably out of structural optimizations to the geometry side and it may not be possible to make it significantly faster by continuing to work on those. i.e. it's getting tough to find other ways to improve it in decent sized steps - at least using the original game data as it is. Further improvements will likely be small things.

For the drawing side, I do also see a way to optimize further for dense scenery but the impact will mainly be for flat-filling. It is still applicable but less useful for textures.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

The DSP optimization for BSP traversal turns out to be very tricky - at the very least it involves self-modifying-code and I haven't had time to properly check that it will work with the instruction sizes needed, in the cycles available. It needs to fill out a 5x6 DSP matrix with view-dependent ordering, within the time taken to write 6 words across the host port from the CPU side without a sync.

I quickly drafted some code which looks about right but won't get a chance to try it properly until next week.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I compiled a new video last night, focused mainly on outdoor or sprawling maps / big geometry / angled, non-boxy scenery. Deep stress testing for recent optimization work. (I think my Falcon actually squeaked - so much violence after 15 years asleep :) )

There are no textures in this one - it is flat-fill @ 320x160 / 16bit TC. This is the format I use for performance testing the engine code.


Anyway I think it's starting to run up against some hardware limits at 16/32, or at least design limits for what I've done with the program. I'm sure there are still ways to optimize it, but it's getting harder and taking longer with each try. The last optimization I tested was nasty, complicated and didn't really make much difference in the end... 1-2%. So I'm going to finally stop with this and fix the newly added bugs before looking at textures again.

I'll post a link once YT has digested it. Internet is broken here so it could be a while...
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3988
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Much pain uploading that, but it's visible now:

https://www.youtube.com/watch?v=2KbDM-Bw80U

Trivia: ARMA5 is a map I used to play at lunchtimes while I was working on PC games. I had a 450MHz PII (P3?) at the time, with an early NV graphics card. It's quite heavy going but the old bird just about copes :)

[EDIT] ...and permissions are now fixed too.
User avatar
bullis1
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2301
Joined: Tue Dec 12, 2006 2:32 pm
Location: Canada

Re: Quake 2 on Falcon030

Post by bullis1 »

Wow! The results of your latest optimizations are staggering! Even the most complex scenes depicted in the video had a satisfyingly smooth framerate.
Member of the Atari Legend team
instream
Nature
Nature
Posts: 176
Joined: Mon Aug 03, 2009 9:08 am
Location: Floda, Sweden

Re: Quake 2 on Falcon030

Post by instream »

Looks awesome :D
User avatar
Scarlettkitten
Captain Atari
Captain Atari
Posts: 262
Joined: Thu Mar 19, 2009 11:42 am
Location: Northamptonshire, UK

Re: Quake 2 on Falcon030

Post by Scarlettkitten »

For a 16MHz Falcon, this is amazing :cheers:
My musical dribbles 🎶 https://sophie-rose.bandcamp.com
Mega ST4, 520STM.
User avatar
Atari030
Atari Super Hero
Atari Super Hero
Posts: 784
Joined: Mon Feb 27, 2012 6:14 am
Location: Melbourne, Australia

Re: Quake 2 on Falcon030

Post by Atari030 »

Bloody unreal. I never expected it to be that good. I probably should have. :wink:
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Quake 2 on Falcon030

Post by calimero »

amount of polygons is crazy and frame rate staggering!

I can not stop comparing THIS to any previous Falcon 3D polygon game - doug code is from another dimension! :)
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
FedePede04
Atari God
Atari God
Posts: 1215
Joined: Fri Feb 04, 2011 12:14 am
Location: Denmark

Re: Quake 2 on Falcon030

Post by FedePede04 »

i really enjoy seeing your videos, and see the fruit of your work.
and you are also good in selecting the right music for them. so good job :cheers:
Atari will rule the world, long after man has disappeared

sometime my English is a little weird, Google translate is my best friend :)
CiH
Atari God
Atari God
Posts: 1266
Joined: Wed Feb 11, 2004 4:34 pm
Location: Middle Earth (Npton) UK

Re: Quake 2 on Falcon030

Post by CiH »

Doug, this is crazy sh1t! :mrgreen:
"Where teh feck is teh Hash key on this Mac?!"
User avatar
TheNameOfTheGame
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2612
Joined: Mon Jul 23, 2012 8:57 pm
Location: Almost Heaven, West Virginia

Re: Quake 2 on Falcon030

Post by TheNameOfTheGame »

I thought it would be interesting to watch but too low fps to be smooth. Boy was I wrong! 8O

This looks fantastic!!

Return to “680x0”