Quake 2 on Falcon030

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
CiH
Atari God
Atari God
Posts: 1136
Joined: Wed Feb 11, 2004 4:34 pm
Location: Middle Earth (Npton) UK
Contact:

Re: Quake 2 on Falcon030

Postby CiH » Fri Nov 07, 2014 9:28 pm

The timeouts on AF now are getting quite extreme.


It's a subtle hint from the god of coding - "More coding, less posting!" :mrgreen:
"Where teh feck is teh Hash key on this Mac?!"

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1996
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Postby Eero Tamminen » Fri Nov 07, 2014 9:58 pm

dml wrote:It's so slow just now that I couldn't record a video in Hatari - partly becase it is slow to start with (about 1fps even with Hatari's fast FPU) and partly because Hatari seems to go into ultra-low-gear while recording videos. Not sure why - the overhead for recording should be pretty fixed, but everything gets 10x slower and 1fps drops to 0.01fps..... I have no explanation for that :) just have to live with it until it gets speeded up...


Hatari AVI video recording speed is completely [1] bound by your system PNG library compression speed. Or if you're using uncompressed (BMP) frame content instead, then recording speed is probably bound up by your disk write speed, as there's a lot more of data to write out.

[1] I just callgrinded with Valgrind, a short piece of Falcon bootup in Hatari with AVI recording & zooming enabled:
- 83% PNG compression
- 9% DSP emulation
- 8% everything else

If you don't want to use BMP AVI video-"codec" (--avi-vcodec bmp), I would suggest doing following to reduce amount of data that needs to be compressed/recorded:
1. disabling zooming (-z 1)
2. cropping statusbar out (--crop on)
3. using higher frame skip value (--frameskips <X>)

To reduce CPU overhead, you might also disable sound output both in Hatari (--sound off) and Quake.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sat Nov 08, 2014 9:33 am

Eero Tamminen wrote:Hatari AVI video recording speed is completely [1] bound by your system PNG library compression speed. Or if you're using uncompressed (BMP) frame content instead, then recording speed is probably bound up by your disk write speed, as there's a lot more of data to write out.
...
If you don't want to use BMP AVI video-"codec" (--avi-vcodec bmp), I would suggest doing following to reduce amount of data that needs to be compressed/recorded:
1. disabling zooming (-z 1)
...


Ok that's interesting - so the PNG compression is getting much slower then because the content is more complex? Textured pixels vs flat filled polys? Because the window size does not change, while the speed drop is very significant. I have been cropping out borders and the status bar in both cases.

I'll try writing out BMP streams then to see if it is more interactive. I just need something that can keep up with movement in roughly realtime.

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2310
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Quake 2 on Falcon030

Postby calimero » Sat Nov 08, 2014 9:42 am

PNG compression heavy rely on difference among neighboured pixels so flat polys are certainly easier to process than this textured "mess" :D

btw
if we can make longshot comparison of Mikros Quake I port which run around 15-20fps on 80MHz 060 (120 MIPS) and
yours code currently running 1fps on 16MHz 030 (6 MIPS) so you already have advantage :)
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sat Nov 08, 2014 10:38 am

It's difficult to compare at this point - different games, different scenery. The pixel routine is definitely slower here since it is using a 'perfect' mapper but the image is also smaller at 160 lines, and no entities... etc. etc.

Will see how it goes when the texturemapping is optimized properly.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sat Nov 08, 2014 3:31 pm

Have made a little progress on the quadratic sampling texture mapper as a prototype in 68k.

There are definitely some extra problems to solve, mostly involving shifts and dynamic range. Trying to formulate it so the shifts can be removed and range for all steps is safe for 23bit arithmetic.

The inner part is working ok but the outer part (z+=zi) needs more precision than I had accounted for at first so I need to do more work there first. This probably means more breakage near the eye, and breaking the scene into more z-bands.

Only when it is all working within range in 68k will I look at moving steps onto DSP one piece at a time.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sat Nov 08, 2014 4:59 pm

This version uses partial FPU quadratic for u-axis and 100% quadratic 68k/fixedpoint for v-axis, for confirmation that each stage is working properly without floating point.

There's still one problem left to solve in the 68k version - removing some unwanted shifts.
You do not have the required permissions to view the files attached to this post.

User avatar
bullis1
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2301
Joined: Tue Dec 12, 2006 2:32 pm
Location: Canada
Contact:

Re: Quake 2 on Falcon030

Postby bullis1 » Sun Nov 09, 2014 1:21 am

The Q2 engine running on an un-accellerated Falcon, even at 1fps, is so impressive that it's almost disturbing :D

Q2 is my favourite id software game btw. They will never top it IMO.
Member of the Atari Legend team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sun Nov 09, 2014 10:05 am

bullis1 wrote:The Q2 engine running on an un-accellerated Falcon, even at 1fps, is so impressive that it's almost disturbing :D


Well I'm still fairly confident I can get it over 1 FPS so hold tight :)

bullis1 wrote:Q2 is my favourite id software game btw. They will never top it IMO.


I think the multiplayer DM was the perfect combination of seriousness and hilarity - something I didn't see in the other games that tried (they tended to be either one or the other).

For me it was also some of the most impressive scenery visuals in a game back then. When I get it working properly I'll post screenshots of favourite views :)

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1996
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Postby Eero Tamminen » Sun Nov 09, 2014 3:06 pm

dml wrote:Ok that's interesting - so the PNG compression is getting much slower then because the content is more complex? Textured pixels vs flat filled polys? Because the window size does not change, while the speed drop is very significant. I have been cropping out borders and the status bar in both cases.


Hatari sets PNG filtering to none, but uses the highest compression level (9). AFAIK these levels correspond directly to Zlib compression levels and there are some comments in PNG header that levels 3-6 could achieve nearly same compression with clearly smaller cost. I'll send a patch to hatari-devel for a command line option to control compression level, you could try whether that helps (or just use uncompressed BMP format).

You should also note that emulation load depends on what needs to be emulated. Does texturing increase DSP utilization significantly?

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sun Nov 09, 2014 4:54 pm

Eero Tamminen wrote:Hatari sets PNG filtering to none, but uses the highest compression level (9). AFAIK these levels correspond directly to Zlib compression levels and there are some comments in PNG header that levels 3-6 could achieve nearly same compression with clearly smaller cost. I'll send a patch to hatari-devel for a command line option to control compression level, you could try whether that helps (or just use uncompressed BMP format).


Ok that probably explains it then. Context sensitive overhead of PNG compressor.

Yes it would be good to favour performance over disk space - since the latter is cheap and solvable these days. Perhaps fastest compression mode is ideal, so disk speed gets a fair advantage but CPU is not bogged down?

The main problem here was interactivity with mouse control, since behaviour becomes more erratic as the framerate drops. Doesn't matter so much for demos where you just wait a bit longer. Mouse input really suffers if the framerate drops to 0.1fps or worse.

Eero Tamminen wrote:You should also note that emulation load depends on what needs to be emulated. Does texturing increase DSP utilization significantly?



Texturing doesn't use DSP yet - however the problem wasn't with emulation speed itself, but apparent nonlinear loss in speed when recording videos with texturing versus recording videos without texturing.

Without texturing the record mode would slow down by x2 or so, but with texturing it slows down by x10 (and texturing is already slow, turning the process into a slideshow)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sun Nov 09, 2014 4:57 pm

So i made a little progress but not much.

Instead of getting stuck with the texture mapping math stuff - I got stopped in my tracks by an unexpected thing, which turned out to be an assembler (or possibly even emulation) fault - the code I was writing is not the code that was getting executed...

Now that I know of that fault and that it is to blame, I can probably work around it by using different instructions.

...or I may just need to update my assembler!

[EDIT]

Turned out to be a weird Hatari debugger disassembly error, so I'm still looking for the real fault.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 12:12 pm

I did eventually get un-stuck, both problems being related to emulation in some form.

So I should soon be able to continue converting the code to 68k and DSP.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 1:14 pm

Got a little time just now to make a bit more progress - the calculations now fully eliminate the FPU from the interior of pixel spans, and partly from the span setup code - with everything done in 68k, and a very small piece (z,uz,vz stepping) has been moved to the DSP already, mainly to prove ranges are ok.

The screenshot below shows the quadratic method breaking down very near the eye - this happens with the original FPU version as well, and is caused by the dynamic range for z being too broad for the approximation I selected. It can be corrected by adjusting the normalized z-range for near polygons.

grab0031.png


I'm mainly concerned right now over any 'new' problems introduced by eliminating the FPU and formatting the values to work in the ranges allowed by the DSP - not yet worried about performance in this early version.

I had mostly tested all of that on the PC but there were a few bits I didn't fully convert, seeming less important, and it is very easy to miss something important and slam into it at the last minute.
You do not have the required permissions to view the files attached to this post.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 6:21 pm

A bit more rapid hacking just now and it is working on the DSP.

This proves that quadratic sampling can be used for texturemapping Quake stuff on a dusty old Falcon.

grab0032.png


This version is not exactly fast, but it's a lot faster than the FPU version :) And a few problems aside, it appears to still work properly. There is plenty of room to optimize.

The FPU is still being used to set up terms for stepping across pixels on every span, which causes a lot of slowdown when viewing anything more than a simple flat surface. But it seems to average about 2-3fps despite that in simpler scenes.

It will easily hit 4fps+ when looking straight at the floor, which is probably indicative of fillrate currently for 320x160 pixels. That's pretty much on target for a non-optimized version, and I had never intended to texture a 320 pixel wide window - even BadMooD doesn't manage that too well on a stock machine. The aim was 160x chunky columns.

So, IdTech #2 on F030 is not dead yet :) Not properly alive and healthy, but not dead either.
You do not have the required permissions to view the files attached to this post.

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 667
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Quake 2 on Falcon030

Postby Anima » Mon Nov 10, 2014 6:26 pm

Great progress. Simply unbelievable. Everything sounds "unreal" especially considering the specs of the machine. This shows the potential of the system in a spectacular way. :cheers:

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Quake 2 on Falcon030

Postby kristjanga » Mon Nov 10, 2014 10:03 pm

This is looking so good! ;)

ctirad
Captain Atari
Captain Atari
Posts: 278
Joined: Sun Jul 15, 2012 9:44 pm

Re: Quake 2 on Falcon030

Postby ctirad » Mon Nov 10, 2014 10:16 pm

This is just Awesome. Congratulations Doug. :cheers:

nemodhs
Atari User
Atari User
Posts: 38
Joined: Sat Aug 31, 2013 2:29 pm

Re: Quake 2 on Falcon030

Postby nemodhs » Mon Nov 10, 2014 10:31 pm

Just stumbled upon this thread.

This looks absolutely amazing.

I love reading about your progress.

Keep it up! :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 10:36 pm

Anima wrote:Great progress. Simply unbelievable. Everything sounds "unreal" especially considering the specs of the machine. This shows the potential of the system in a spectacular way. :cheers:


Thanks Sascha. I'm also impressed with your ChoRenSha project - seeing the Aranym video running caused me some confusion, took a few seconds to get my head around the fact it was a TOS version and not the original!

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 10:37 pm

kristjanga wrote:This is looking so good! ;)


ctirad wrote:This is just Awesome. Congratulations Doug. :cheers:


Cheers everyone.

I recorded a short video but YT completely destroyed it - doesn't like the low fps, and just shows still frames. Will try again and if it doesn't work will wait until it is a bit more optimized and might encode better.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 10:53 pm

A very early peek at the old F030 rendering Q2 with textures in realtime...

https://www.youtube.com/watch?v=EJY93JT ... e=youtu.be

This is Hatari - not tried it yet on real HW, but it's probably a bit pointless until I remove the last remnants of 68882 code, which is currently still bogging it down quite much.

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2310
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Quake 2 on Falcon030

Postby calimero » Mon Nov 10, 2014 11:37 pm

dml wrote:A very early peek at the old F030 rendering Q2 with textures in realtime...

https://www.youtube.com/watch?v=EJY93JT ... e=youtu.be

This is Hatari - not tried it yet on real HW, but it's probably a bit pointless until I remove the last remnants of 68882 code, which is currently still bogging it down quite much.

it is almost like Quake 2 C port on Amiga 060/66MHz :D

https://www.youtube.com/watch?v=1h5RRUP4Wyc
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 10, 2014 11:42 pm

calimero wrote:it is almost like Quake 2 C port on Amiga 060/66MHz :D
https://www.youtube.com/watch?v=1h5RRUP4Wyc


You can be sure, it will get faster than this. ;-) A lot faster...


But I will need a break before optimizing it properly. Very busy next week and the week after. Will see how much time I get to play in between.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Tue Nov 11, 2014 11:45 am

So if anyone wants to have a go at it on the Falcon (or another retro box with a RISC chip inside), here's how the mapper works:

First you need to build a table of coefficients for quadratic equations, offline.

This involves solving and storing A,B,C terms in order to later query y=(Ax^2+Bx+C) for any (x), where (x) is essentially the scene (z) term for a given pixel, and the result (y) approximates (1/z). I'm using a table of 1024 equations but you can optimize in either direction to save space or gain range/accuracy.

It should be possible to use linear equations if the table is big enough, and perhaps save some cycles while still getting a decent(ish) approximation - but for the Falcon's DSP it's not much trouble to just do it properly.

Note that the table stores equations - triplets, not single values. This means you're performing a lookup on a set of curves - not single value samples.

Generating the table is a bit hard - it involves performing a best-fit on equations using a set of sample points on each curve. I use subdivision but random sampling may work. For most of the table entries, the same 3 points will converge to the best fit, but for entries near the ends of the table the choices will move due to clamping effects enforced on A,B,C for the legal fixedpoint range. It's important to be aware of this detail or you'll get stuck. There are a few gotchas involved in generating the table and due to the nature of best-fit algorithms, you can end up with a broken solver that looks like it is nearly working - beware.

Despite those problems, It's relatively easy to understand/test in floating point because A,B,C can be kept in their natural range. A fixed-point version however is much more difficult since the terms need to be normalized to maximize use of available bits, and for optimal precision they must be differently normalized. This part is a challenge but it can be shown to (just) work with as few as 23 bits + sign for all source terms.

2) Implement the runtime part, which efficiently performs y=(Ax^2+Bx+C).

For this to be efficient, you really need a RISC device with a multiply-accumulate and fast shifting capability. Or at the very least, a very fast multiplier and careful coding. Unfortunately the Falcon's DSP is terrible at shifting and does present some problems of its own here, getting it to work fast. Left as an exercise for the reader ;)

The transform looks a bit like this:

normbits = 23; // for Falcon's 24/48 DSP accumulator - 1 bit auto denorm on this device. should be 32 for a 64bit RISC accumulator.
qbits = 13; // 10 table bits + 13 precision bits == 23
tbits = 8; // arbitrary fraction retained for texture u,v, multiply precision

// during setup, get z, uz, vz normalized into fixedpoint range
z *= (int)(1<<normbits);

x = (int)pixel_z;
ix = (x >> qbits);
A = qtab[ix].A;
B = qtab[ix].B;
Q = qtab[ix].C; // C already shifted by (tbits)
Q += (B*x) >> (normbits-tbits);
Q += (((A*A)>>normbits) * x) >> (normbits-tbits);
return Q;

On the DSP it looks a bit like this (not optimized, not scheduled and missing some details):

Code: Select all

;   x0   z
   ..
   schedule these moves elswhere, fuse across >1 iteration
   ..
   move            y:qtab_ptr,a
   move            y:rshft12,y0
   mac   x0,y0,a         y:c_FFFFFE,y0   ; &qtab[(X>>12)]               
   and   y0,a               ; &qtab[(X>>13)*2]
   move   a,r4
   ..
   schedule u,v part here, overlap x/y access if table overlaps low memory etc. etc.
   ..
   move            x:(r4),b   ; C
   mpy   x0,x0,a         y:(r4)+,y0   ; B
   mac   y0,x0,b      a,x1   y:(r4)+,y0   ; A
   mac   y0,x1,b               ;
   ..
   move    b,x0 ; 1/z

...
now multiply x0 (1/z) by uz,vz and combine into texture address. uz,vz should be pre-normalized.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 7 guests