Quake 2 on Falcon030

All 680x0 related coding posts in this section please.
Post Reply
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Quake 2 on Falcon030

Post by dml »

So I started looking at Quake 2 towards the end of the current BadMooD thread, and will move the Q2 stuff over here from now on.

(old thread: http://www.atari-forum.com/viewtopic.ph ... 00#p255911)

Yesterday after work I figured out how to deal with the world vertices in a camera-relative normalized space such that everything can be done with 32bit integers, and most probably with 24bit integers. This means that the whole vertex pipeline can probably be DSP'd, which is one step better than what I thought when I started looking at it. There are enough vertices being processed for this to matter. Doing it on the CPU even with assembler is proving to be a bit expensive with all the 32bit multiplies and the DSP is great for it anyway. The camera matrix could be 16bit, but even camera-local vertices need more than that for these worlds so ST-style optimizations don't seem to be a good fit here. Maybe there's room for some weird hacks if a TT version is looked at but not the best solution on Falcon for sure.

The current vertex pipeline is now 90% integer based (the original is 100% floats), but it's actually doing both integer and floating point simultaneously in order to validate the changes work properly. I will leave it like that until it is 100% converted and reasonably well tested.
EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 926
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK

Re: Quake 2 on Falcon030

Post by EvilFranky »

Subscribed :mrgreen: :coffe:
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 3332
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Quake 2 on Falcon030

Post by Cyprian »

just awesome Doug
Lynx I / Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Cyprian wrote:just awesome Doug
Thanks, as always :)

I figured out the remaining problem at lunchtime shortly after posting that. Turns out I had forgotten to mark a clobber on one of my GCC assembly fragments, so 'correct code' was turning into 'bad things' when compiled. I do that sometimes, it's not always easy to spot but clues start forming when the debug build works :)

This is all debug code anyway for testing the idea mostly in C so it will disappear later.


(for anyone using GCC with asm - take great care to read the docs, over and over and over until you can remember it with your eyes closed)

[EDIT]

In case anyone cares, the faulty code looked like this:

Code: Select all

		...
		muls.l	(%[src]),d2:d0;		\
		move.w	d2,d0;			\
		swap	d0;			\
		muls.l	4(%[src]),d2:d1;	\
		move.w	d2,d1;			\
		swap	d1;			\
		add.l	%[vpx],d0;		\
		add.l	%[vpy],d1;		\
		move.l	d0,(%[dest]);		\
		move.l	d1,4(%[dest]);		\
"
		: "=m"(*dest) // FIXED: we're modifying the target pointed by *dest !!!!!!
		: [dest] "a"(dest), 
		  [src] "a"(src), 
		  [vpxs] "d"(Cam->vpxs), 
		  [vpxr] "d"(Cam->vpxr), 
		  [vpx] "g"(Cam->vpx), 
		  [vpy] "g"(Cam->vpy)
		: "d0", "d1", "d2", "cc"
	);
Zamuel_a
Atari God
Atari God
Posts: 1285
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Quake 2 on Falcon030

Post by Zamuel_a »

Do you think it''s possible to make a playable version of Quake 2 on a regular Falcon or will it just be a viewer of the maps? I guess something like 5fps is necessary to call it playable.
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Zamuel_a wrote:Do you think it''s possible to make a playable version of Quake 2 on a regular Falcon or will it just be a viewer of the maps? I guess something like 5fps is necessary to call it playable.
For now, it's just an experiment in drawing the world. This is the main limiting factor for doing anything else.

I'm speculating that a flat/gourad shaded world is feasible, at least for some subset of available maps. It's probably quite close to the edge of what the Falcon can manage without acceleration. Much will depend on how effective the DSP is at the spanbuffer implementation, and bandwidth for vertices and edges.

I will attempt texturing if the first part is successful.

Feasibility of the game is difficult to estimate - would involve a lot of work in other areas - meshes and a z-buffer to support that. A low-res chunky mode will help with the zbuffer, but meshes will depend on DSP and bus performance. Singleplayer would probably be a serious challenge and maybe out of scope. 1-on-1 multiplayer though, is perhaps doable.

As for the upper layers of the engine - collision detection etc. is probably feasible without removing the floating point but will still set a cap on the framerate. Hard to speculate on the cost just now. Performing the collision detection without floating point would likely be a nightmare.

Will just have to take it in chunks and see where it goes and exactly where the Falcon runs out of steam.


The Q2 graphics pipline is simpler, more elegant than Doom. This is an advantage. There are more ways and more effective ways to optimise it in one direction or another. Doom was a headache, really. There were lots of nice tricks and stuff you could exploit to save time but it was very complicated to do so.
Dal
Administrator
Administrator
Posts: 4224
Joined: Mon Feb 20, 2006 9:00 pm
Location: Cheltenham, UK
Contact:

Re: Quake 2 on Falcon030

Post by Dal »

How about targeting this as a CT60-release?
STE: Desktopper case, IDE interface, UltraSatan (8GB + 512Mb) + HXC floppy emulator. Plus some STE's/STFM's
User avatar
Scarlettkitten
Captain Atari
Captain Atari
Posts: 262
Joined: Thu Mar 19, 2009 11:42 am
Location: Northamptonshire, UK

Re: Quake 2 on Falcon030

Post by Scarlettkitten »

Even a wireframe or flat quake 2 would be amazing on the falc. 8)
My musical dribbles 🎶 https://sophie-rose.bandcamp.com
Mega ST4, 520STM.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Dal wrote:How about targeting this as a CT60-release?
I would in fact have ported the Amiga/060 one by now but I've been slow to set up the CT60 and configure it well enough to work natively on a big project. I initially tried to replace the hw/OS bits using Hatari and cross-tools over a weekend but Hatari doesn't provide TT Ram yet so it wasn't very practical.

(I've been bugging the nice Hatari guys to add TT ram for a while so hopefully it will appear in 1.9 ;-) )
EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 926
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK

Re: Quake 2 on Falcon030

Post by EvilFranky »

Probably a stupid question given how much power a 100mhz 060 provides...but would the DSP be any help towards the rendering on an 060 version?
Dal
Administrator
Administrator
Posts: 4224
Joined: Mon Feb 20, 2006 9:00 pm
Location: Cheltenham, UK
Contact:

Re: Quake 2 on Falcon030

Post by Dal »

As an additional thought, if the screen resolution was acquired from the VDI/AES, then the SuperVidel's higher resolutions could be implicitly supported? :D
STE: Desktopper case, IDE interface, UltraSatan (8GB + 512Mb) + HXC floppy emulator. Plus some STE's/STFM's
Dal
Administrator
Administrator
Posts: 4224
Joined: Mon Feb 20, 2006 9:00 pm
Location: Cheltenham, UK
Contact:

Re: Quake 2 on Falcon030

Post by Dal »

EvilFranky wrote:Probably a stupid question given how much power a 100mhz 060 provides...but would the DSP be any help towards the rendering on an 060 version?
On bus-accelerated machines, I would say yes. Also consider the DSP acts almost as a second 'processor' so can be performing calcs separate to the 060 if I am not much mistaken...?
STE: Desktopper case, IDE interface, UltraSatan (8GB + 512Mb) + HXC floppy emulator. Plus some STE's/STFM's
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Dal wrote:As an additional thought, if the screen resolution was acquired from the VDI/AES, then the SuperVidel's higher resolutions could be implicitly supported? :D
Yes probably. i don't see anything limiting output resolution in the source. IIRC the original had resolution settings anyway, like Quake.

However putting the resolution up is a good way to make it very much slower, for reasons that have nothing to do with the speed of the video memory, so I don't know how useful it will be for that game...
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Dal wrote:
EvilFranky wrote:Probably a stupid question given how much power a 100mhz 060 provides...but would the DSP be any help towards the rendering on an 060 version?
On bus-accelerated machines, I would say yes. Also consider the DSP acts almost as a second 'processor' so can be performing calcs separate to the 060 if I am not much mistaken...?
It's great for audio, if it's kept out of graphics work. It doesn't get in the way providing data is flowing towards it via DMA and nothing needs to be polled on the way out, which suits audio.

It can process geometry quite well but can't get data in or out fast enough for that to be worthwhile so even as a parallel unit, it's not serving much use. It will be quite effective for hidden surface removal but still probably lagging the 060 by some margin.

Without DSP of course the 030 version stands no chance whatsoever. :)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3993
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Post by Eero Tamminen »

Regarding audio, does Quake still use MIDI style music content like Doom did?
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Eero Tamminen wrote:Regarding audio, does Quake still use MIDI style music content like Doom did?
No, the supplied music was on CD. There is no MIDI in Q1 or Q2, at least as far as I know.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

There are now 3 short vids in the 'series' showing effects of changes to the code.

https://www.youtube.com/watch?v=J7KCzRt ... 5nMm10m0UM

Not very exciting but progress is visible anyway. I'll post a new one each time something changes in a meaningful way.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

All vertex transform and projection is now on the DSP, and the cost for that stuff combined has dropped from about 35% total time to about 8% total time.

This cost is still inflated by the fact that the vertices have to be extracted again, and the code is synchronous so it's blocking/idling 60% of the time. The true cost is probably less than 5% with a lot of that being data transfers. So that all seems like good news because there's lots of room to speed it up.

The DSP code is all done in a hurry so it's not optimized at all. Just tried to make it as correct as possible so its probably possible to reduce vertex processing time to <2% of current time.

Around 55% of total time now is spent processing faces and clipping edges, which will need reworked next. Some BSP/vis/face stuff can't be moved to the DSP and will just need to be optimized a lot. All of the clipping and edge work can be moved though.

I won't bother uploading a video of the last changes because it looks the same again, just a bit faster :) Will wait until the next part is done when I find time for that, it's a bit fiddly.

[EDIT]

A quick test using the current version at a really slow area yielded about 2fps, or 500ms per frame. The vertex processing is still hovering at 7% so that's 35ms (or around 1.5 video frames) to project the scene into 3D without clipping. That's already fast enough to make the thing work, even using the crappy code. So the vertex processing problem seems to be solved at least for baseq1. :twisted:
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Made quite a lot of changes without any interesting speed gains to make up for the effort. :) But that was expected.

There are 10k+ vertices in the map and only room on the DSP for 1k-2k verts at a time, taking into account other things needed. So I had to change the way the vertices were being processed to make sure they had a packed/reindexed representation on the DSP.

Now the vertices are sent to the DSP in camera-local space, and transformed/projected on the DSP and remain there. They don't get extracted again as was the case before.

The z-clipping part has also been moved to the DSP and is mostly working although not perfect yet.

The test is still slow because the clipped edges get extracted again and clipped in 2D by the CPU before committing to display. Most of the time is now spent hanging around there.

Things are improving though - once the 3D clipping has been fixed and the 2D clipping has been DSP'd, it will begin to get more interesting.
Zamuel_a
Atari God
Atari God
Posts: 1285
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Quake 2 on Falcon030

Post by Zamuel_a »

Would Quake 1 be faster to run? I think the engine is more or less the same, but maybe the levels are not so complex?
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Zamuel_a wrote:Would Quake 1 be faster to run? I think the engine is more or less the same, but maybe the levels are not so complex?
It might be, but Q2 seems like a harder target :twisted:

But if it ends up being too slow, Q1 maps are still waiting and will probably be easier/quicker to draw.


[EDIT]

Got the scene vertex and edge processing down to 2.7% and 5.5%, with the latter still being in C so there's room left.

5.56% _P_ProcessTaggedEdges
2.72% _R_TransformAndProjectTaggedVertices_DSP56k

Also got zclipping to work now. So it can move along.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Quick update.

The DSP is now transforming and projecting the scene as fast as the CPU can send the x,y,z coordinates - that's like 'free hardware 3D' :), so while there is still a lot of room to optimize the DSP code there isn't any benefit in doing so until the CPU side gets work. I've left that side of things alone for now.

I have also figured out a way to avoid sending all of the scenery every frame. However once again it's not worth doing if the the CPU can't figure that out faster than it can copy the data. It's best done as vertex groups and therefore probably map clusters, to make it worthwhile.

I'm working on the 2D clipper now, which is quite lengthy and hard to debug. From what I can see Quake clips faces 'properly' with dotproducts against frustum planes. It's simple, compact and eliminates vertices and edges early, before projection.

I'm wary of using this on the DSP with integer math because it relies a lot of precision, with failures causing edges to wander off the screen. This gets worse for near z's and the amount of wander could be significant - enough to corrupt memory or other badness. It also needs a margin around the display for small errors and this doesn't work at all if you want to shrink the window smaller than the display.

That approach is just a potential a nightmare of bugs and weirdness, so I'm going to sidestep it completely and clip edges in 2D, after projection. Once that works, a full 3D clipper can always be tried with a working 2D version to compare against. I have no idea what will be faster because 2D clipping is trivial on the DSP, the amount of projection worked saved by a 3D clipper is moderate to small, and winding clipper using dotproducts means having the DSP deal with higher representations of faces (sending a bit more data). So it's not worth guessing what works best at this point. I'll try the reliable one first, and then perhaps the other one later.

The clipper will probably be a bit of a bottleneck for inserting edges whatever is used, but it can be optimized in various ways and still probably an order of magnitude less expensive than the next stage needing solved.


All of the stages so far are resolution-independent. The amount of work does not shrink as you attempt to reduce the size of the window being drawn. It is all processed as coordinates and adjacency information. This is one of the reasons I have attacked it first, and putting some effort into estimating how much its impact can be reduced - because once that limit is found it will set the upper bound for the framerate - in any drawing mode or at any window size. If it is found that most maps produce too much work of this kind, it impacts the feasibility of everything else and any decisions made about what to do with it next.

Nearly all of the stages which follow will be resolution-dependent, and costs will change as the window is reduced. This is obviously nice because it's a last-resort band-aid on the second stages not being fast enough :)

Ignorning textures for now, there are two obvious 'serious bottlenecks' which will need overcome on a stock Falcon.

1) Edge insertion to the spanbuffer. This is a linear linked list insertion-sort in Quake (IIRC-need to re-read it!), and that is going to hurt because it relies on PVS and clipping to dodge unwanted insertions and hidden surface can't be done until after insertion of all potentially visible edges.
2) Span drawing by CPU.

I have had a tested solution for (1) since around 1997, so that should be ok :-). (2) is a matter of writing very fast 68k and crossing fingers.
Dal
Administrator
Administrator
Posts: 4224
Joined: Mon Feb 20, 2006 9:00 pm
Location: Cheltenham, UK
Contact:

Re: Quake 2 on Falcon030

Post by Dal »

Would the fpu help at all? sourcing and installing one is relatively simple enough.
STE: Desktopper case, IDE interface, UltraSatan (8GB + 512Mb) + HXC floppy emulator. Plus some STE's/STFM's
EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 926
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK

Re: Quake 2 on Falcon030

Post by EvilFranky »

By the sounds of it Doug has removed all the dependencies of the FPU from the Falcon version of the engine, converted from floats to fixed point as this is what the DSP excels at.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3978
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

EvilFranky wrote:By the sounds of it Doug has removed all the dependencies of the FPU from the Falcon version of the engine, converted from floats to fixed point as this is what the DSP excels at.
This is true, however some of the higher functions are left using floating point for a variety of reasons. So the project does still depend on FPU.

The main gain is precision for collision detection. Converting that to fixedpoint could make it unstable. You've probably encountered some games with flaky collision detection, caused by faulty code, design or bad optimizations and precision problems. It's not nice. The Quake games had decent, stable collision detection for the time. If it ends up being so slow that it must be converted then it can be - but it will be much harder and problematic than leaving it alone, using FPU.

It's a matter of bandwidth. If the FPU can be given low bandwidth, high precision work then it can excel. The high bandwidth stuff need re-routed though to avoid bottlenecking through the FPU.

The FPU can perform a 64x64=80 bit float multiply, taking hundreds of clocks, at 16MHz.
The DSP can perform a 24x24=56 bit fixedpoint multiply, taking 2 clocks at 32MHz

There is no competition between them concerning bandwidth here. At least 100x faster. Main problem is that some things (like very high precision calculations) don't translate well to DSP, or end up being so complicated its best left alone.
Post Reply

Return to “680x0”