Quake 2 on Falcon030

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 870
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: Quake 2 on Falcon030

Postby EvilFranky » Tue Nov 11, 2014 12:08 pm

dml wrote:
calimero wrote:it is almost like Quake 2 C port on Amiga 060/66MHz :D
https://www.youtube.com/watch?v=1h5RRUP4Wyc


You can be sure, it will get faster than this. ;-) A lot faster...


But I will need a break before optimizing it properly. Very busy next week and the week after. Will see how much time I get to play in between.


I guess it also kinda shows potential for a properly ASM optimized version of an 060 version. A CT60 version should effectively destroy anything they've done on a 68k Amiga. Better colour, better sound and faster.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Tue Nov 11, 2014 12:14 pm

EvilFranky wrote:I guess it also kinda shows potential for a properly ASM optimized version of an 060 version. A CT60 version should effectively destroy anything they've done on a 68k Amiga. Better colour, better sound and faster.


CT60+SV is a deeply evil combination with a lot of potential. It should stomp on pretty much everything in the 'retro arena'. But it probably takes a lot of time and effort to get the best out of it. I doubt anyone has seen what it is capable of.

I will eventually get around to playing with it but I'm having too much fun on the old machine still in what spare time I can get.

I probably wouldn't do CT60 stuff just because the machine is faster, but because I think some interesting problem needs to be tried on it, to see where it can go.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Wed Nov 12, 2014 12:40 pm

Last night I managed to remove all of the FPU operations from individual spans. This approximately doubled the speed of the renderer. Bearing mind the fact Hatari's FPU is not cycle accurate and in fact 'Pentium-fast', it has much more than doubled the speed over real hardware (I never made the effort to measure on HW, because I knew performance would be awful - it was probably 5 times slower again than what was seen in Hatari).

grab0034.png


The speed is still capped at around 4.5-5fps when looking at a single surface, but for more complex scenery it has risen from 1-2fps to 2-4fps. So half of the time spent in short spans was spent setting up the span.

The resolution is still 320x160, so the fillrate can be pseudo-doubled (probably a 50% gain in real terms) by using chunky columns (this optimization will be done last).

This involved moving the solving of initial (z, uz, vz) onto the DSP for the first pixel in each span - something that turned out to be much harder than anticipated, because there is one aspect of this which depends a lot on FPU dynamic range - the magnitude of the origin terms.

As angle between a surface and the eye approaches 90's, the z,u,v gradients become very large. However the origin z,u,v for the left edge of the screen becomes offset by (very large * many pixels) and the resulting offset calculations temporarily exceeded the 48bit numeric range of the DSP in a few stages. Fortunately it could be salvaged once I figured out where the overlflows were happening and is now working. There are a couple of hacks in there to make it work but I can tidy it up later.
You do not have the required permissions to view the files attached to this post.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Wed Nov 12, 2014 12:42 pm

No effort yet spent on the pixel shader. It remains unoptimized.

For the coders out there, the CPU side currently looks like this.


Code: Select all

.pxlp:
   dspwaitr.0   d3,(a3)         ; sync-poll (delay)
   move.w      (a2),d7         ; tex address
   move.b      (a5,d7.w),d0      ; 8bit surface cache
   move.w      (a6,d0.w*2),(a1)+   ; live palette indirection

   dbra      d2,.pxlp




In theory - with more work - it should end up looking (a bit) more like this:


Code: Select all

   dspwaitr.0   d3,(a3)
.pxlp:
   rept      unroll_size   
   move.w      (a2),d7         ; tex address
   move.w      (a5,d7.l*2),(a1)+   ; 16bit surface cache direct
   endr

   dbra      d2,.pxlp


Remains to be seen if that ideal will be achieved once the DSP side is finalized, but clearly, it *is* going to get faster still.

[EDIT]

While I have a few minutes I dug up the second main reason it should get faster soon. The polygon filler is still expensively calling back into the C code once per face, for hundreds of faces being drawn, to set up the surface data. Aside from most of the calls being redundant, the code on the other side is quite slow and empties the CPU caches, best case.

So there is plenty of room to improve things before it settles.

Code: Select all

   pushall
   ext.l      d7

;   call back into engine & perform surface plane setup math (heavy)
;   todo: this should all be DSP-side anyway
;
;   R_ComputeTexturePlane(face_index);

   move.l      d7,-(sp)
   jsr      _R_ComputeTexturePlane
   move.l      (sp)+,d7
   move.l      d0,psp
   

;   call back into engine & populate surface, obtain source addr
;   todo: inline ultrafast path for ready surfaces
;
;         type=GetSurface(
;            psurf->face_index,
;            psurf->mip_level,
;            &ptex,&ptex_width,&ptex_height);

   move.l      #ptex_height,-(sp)      ; h
   move.l      #ptex_width,-(sp)      ; w
   move.l      #ptex,-(sp)         ; ptex
   move.l      #1,-(sp)         ; mip 0-3
   move.l      d7,-(sp)         ; face_index
   jsr      _GetSurface
   lea      5*4(sp),sp
   move.l      d0,pr_type
   
   
;   surface C struct
   move.l      psp,a2

;   load texture width (max size only 256? ...check it)
   dspwaitw.0
   move.l      pr_tex_width,DSPHost32.w

;   load surface plane EQs
;   todo: this should all be DSP-side anyway

;   load zi
   fmove.s      texpln_zi(a2),fp0
   dsploadfp.x   fp0,fp7,d0

;   load zj
   fmove.s      texpln_zj(a2),fp0
   dsploadfp.x   fp0,fp7,d0

;   load zc
   fmove.s      texpln_zc(a2),fp0
   fscale.w   #-1,fp0            ; todo: eliminate hacks like this
   dsploadfp.x   fp0,fp7,d0

;   load vzi
   fmove.s      texpln_vi(a2),fp0
   fscale.w   #8+11,fp0            ; todo: bake denorm into surface plane, early
   dsploadfp.x   fp0,fp7,d0
;   load vzj
   fmove.s      texpln_vj(a2),fp0
   fscale.w   #8+11,fp0
   dsploadfp.x   fp0,fp7,d0
;   load vzc
   fmove.s      texpln_vc(a2),fp0
   fscale.w   #8+11,fp0
   dsploadfp.x   fp0,fp7,d0

;   load uzi
   fmove.s      texpln_ui(a2),fp0
   fscale.w   #8+11,fp0
   dsploadfp.x   fp0,fp7,d0
;   load uzj
   fmove.s      texpln_uj(a2),fp0
   fscale.w   #8+11,fp0
   dsploadfp.x   fp0,fp7,d0
;   load uzc
   fmove.s      texpln_uc(a2),fp0
   fscale.w   #8+11,fp0
   dsploadfp.x   fp0,fp7,d0


   
   popall

*-------------------------------------------------------*

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Wed Nov 12, 2014 6:26 pm

One more mishmash vid, just for fun:

https://www.youtube.com/watch?v=czt7IJbHD-I

Still not had time to start optimizing it, and unlikely to get time next week (got a lot on during the week, will be too tired in evenings), but will post when something changes...

User avatar
CiH
Atari God
Atari God
Posts: 1136
Joined: Wed Feb 11, 2004 4:34 pm
Location: Middle Earth (Npton) UK
Contact:

Re: Quake 2 on Falcon030

Postby CiH » Wed Nov 12, 2014 6:52 pm

Still not had time to start optimizing it, and unlikely to get time next week (got a lot on during the week, will be too tired in evenings), but will post when something changes...


And it's still faster than Lasers and Men on a stock Falcon! :mrgreen:
"Where teh feck is teh Hash key on this Mac?!"

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2310
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Quake 2 on Falcon030

Postby calimero » Wed Nov 12, 2014 6:54 pm

dml wrote:One more mishmash vid, just for fun:

https://www.youtube.com/watch?v=czt7IJbHD-I

Still not had time to start optimizing it, and unlikely to get time next week (got a lot on during the week, will be too tired in evenings), but will post when something changes...

hm... so you do not use FPU for drawing frames anymore? (so FPU high-speed of Hatari does not affect FPS in this preview? it should be same on real hardware??)

this is amazing! it is unbelievable that plain Falcon can do something like this!! :D :D what is resolution in this video?

@CiH

I just watched: https://www.youtube.com/watch?v=Qw1PyUkVuZk ("Atari Falcon Games and the fastest doom engine for the system") Dougs code easily outpace anything seen on Falcon so far!
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Wed Nov 12, 2014 7:35 pm

calimero wrote:hm... so you do not use FPU for drawing frames anymore? (so FPU high-speed of Hatari does not affect FPS in this preview? it should be same on real hardware??)


There's a little bit of FPU left in the per-face setup, currently about 3% on Hatari (probably 15%+ on real Falcon). But its probably getting quite close now.

Real HW will be faster at drawing the pixels too, since Hatari's host port is more throttled for compatibility reasons, and the CPU datacache helps serve reused texels. These benefits won't show very much until some of the other remaining work is done.

No FPU used at all by the drawing routines. That's all gone.

calimero wrote:this is amazing! it is unbelievable that plain Falcon can do something like this!! :D :D what is resolution in this video?


The native image was recorded 320x160 (320 pixel video mode using 160 pixel game window, doubled up to full size with software chunky pixels - not had time to do any Videl trickery yet)

calimero wrote:I just watched: https://www.youtube.com/watch?v=Qw1PyUkVuZk ("Atari Falcon Games and the fastest doom engine for the system") Dougs code easily outpace anything seen on Falcon so far!


That's quite an old vid of the engine in chunky pixel mode - the latest one is on my channel, with gameplay and music, and *without* chunky pixels :)

BTW my 68k is no better than anyone elses in the Atari scene - but both project use good coupling of CPU+DSP, some cases probably being a bit unusual. A lot of it comes down to making good choices and replacing a lot of the original code with tight assembler and removing latency.

Zamuel_a
Atari God
Atari God
Posts: 1235
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Quake 2 on Falcon030

Postby Zamuel_a » Wed Nov 12, 2014 10:05 pm

dml wrote:One more mishmash vid, just for fun:

https://www.youtube.com/watch?v=czt7IJbHD-I

Still not had time to start optimizing it, and unlikely to get time next week (got a lot on during the week, will be too tired in evenings), but will post when something changes...


This is really impressive! It's not much slower than the flat filled polygon version and you say it can even be faster. :)
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
viking272
Captain Atari
Captain Atari
Posts: 406
Joined: Mon Oct 13, 2008 12:50 pm
Location: west of London, UK

Re: Quake 2 on Falcon030

Postby viking272 » Wed Nov 12, 2014 11:07 pm

Really impressive work in that last video Doug. Glad to read the updates but sorry to hear you're busy with other stuff! :x

Not sure if this has been seen by some of you...
It's an interesting article on Wolfenstein 3D iPhone dev by John Carmack - who did it himself. He also touches on Quake 3 for the iPhone and changing tic rates. :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Wed Nov 12, 2014 11:12 pm

Zamuel_a wrote:This is really impressive! It's not much slower than the flat filled polygon version and you say it can even be faster. :)


Thanks Zamuel, yes there is room to speed it up still - and some unused tricks left in the bag if things get too difficult.

There are actually a lot of small bugs and glitches collecting up which need cleaned out, and I need to deal with some of them before doing much more with it or it will cause problems later. There are lots of little flecks, sparkles caused by precision loss and walking off the end of one texture into another (or empty texture cache pages, which are coloured pink #255). Quake doesn't tile textures so the coordinates must be exact or this happens. The first scanline is also dead, and the sky and liquid surfaces are completely broken.

On the plus side, the appearance will improve once the texture cache is upgraded to 16bit colour. It's currently still 8bit like the PC version.

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Quake 2 on Falcon030

Postby kristjanga » Wed Nov 12, 2014 11:35 pm

I should make an update on my old video on falcon games :)

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2310
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Quake 2 on Falcon030

Postby calimero » Thu Nov 13, 2014 12:32 am

dml wrote:
calimero wrote:I just watched: https://www.youtube.com/watch?v=Qw1PyUkVuZk ("Atari Falcon Games and the fastest doom engine for the system") Dougs code easily outpace anything seen on Falcon so far!


That's quite an old vid of the engine in chunky pixel mode - the latest one is on my channel, with gameplay and music, and *without* chunky pixels :)

I wan not talking about BadMood in this particular video but all others 3D games/demos on Falcon.

So far most impressive 3D stuff on plain Falcon which I saw come from Tat/Avena or from No/Escape, both have really great great demos.

dml wrote:The native image was recorded 320x160 (320 pixel video mode using 160 pixel game window, doubled up to full size with software chunky pixels - not had time to do any Videl trickery yet)

is there any chance to setup videl somehow to achieve blured pixel (at low cost) when it works in double column mode?

@kristjanga it would be nice! :) (and remove Canon Fodder ;))
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

ctirad
Captain Atari
Captain Atari
Posts: 278
Joined: Sun Jul 15, 2012 9:44 pm

Re: Quake 2 on Falcon030

Postby ctirad » Thu Nov 13, 2014 11:59 am

OMG!

BTW, is possible to get sorce code or binary somewhere? I'd like to test in on the overclocked F030. Same question for BadMood.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Thu Nov 13, 2014 12:25 pm

A bit unwell with migraine today so pretty useless, can't look at a screen long...

calimero wrote:I wan not talking about BadMood in this particular video but all others 3D games/demos on Falcon.


doh! :oops:

calimero wrote:So far most impressive 3D stuff on plain Falcon which I saw come from Tat/Avena or from No/Escape, both have really great great demos.


I remember some of the early ones beginning to get really good (incl. from Tat), but I missed a lot while I was away. Been too busy on my own stuff to catch up with all the demos which got released in that period, but slowly getting through some ST and Falcon stuff.

calimero wrote:is there any chance to setup videl somehow to achieve blured pixel (at low cost) when it works in double column mode?


At a guess, it may be possible to toggle the left display start by 1 pixel at video refresh rate, and get temporal blur for free. I have never tried - it may or may not work, may or may not work with hardware chunky pixels. I'm sure somebody will try and see, if not already.

Other methods I think must use software. Either 8bit lookups or rotate/mask/shift/add sequences in 16bit colour - expensive.

I once did bilinear filtering in a software rasterizer on a Pentium-100 using 4bit lookups but that's a different kind of blur :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Thu Nov 13, 2014 12:29 pm

ctirad wrote:OMG!
BTW, is possible to get sorce code or binary somewhere? I'd like to test in on the overclocked F030. Same question for BadMood.


Yeah, apologies for being slow to get BM out. It should be out later this month.

There's a bit more to do with the Q2 project before I post a new binary - I still want to take the remaining FPU code out and optimize the pixel shader (there is an older binary, much earlier in this thread but it is flatshading only and missing a lot of the recent optimization).

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Fri Nov 14, 2014 11:37 am

I had a little time to look at some strategies for optimizing per-face setup.

In the original code, every face gets set up on the way into the rasterization step - partly so the z-plane can be used to generate the zbuffer for scenery surfaces.

By eliminating the zbuffer, it's been possible to move face setup math to the last stage - for faces with physical spans (at least one pixel needing drawn). This cuts down the number of setups quite a bit. But its still a lot of work and still FPU-bound.

The same is done for surface cache filling, but this was always done at the last stage - I didn't change that part.


It turns out that there are quite a lot of options for speeding it up, just that none of them are quick and easy to test.

1) just move it to the DSP (will take a while to move it all, quite a lot of it). this will create a small delay before each face is drawn - not much parallelism, but it will at least be done fast. it may however not be so accurate and cause more sparkles.

2) have the scanconvertor report to the CPU on the first span generated on any new face during scanning, so the CPU/FPU can set up the face math in parallel during scanning. this will slow down scanning a little bit, but gain parallelism. it also allows the CPU cache to be dedicated to one tight loop of code for that one task instead of fragmenting among many tasks as it is now. however it still requires a pretty optimal version on the CPU/FPU side and might still be too expensive.

3) some kind of split - do half of it in parallel during scanning, and half just before drawing. more complex to get working, might involve transferring more values to DSP per face.

4) leave it as it is, but optimize the hell out of it and recommend overclocking the FPU :) but that would put another FPU-shaped dent in the 'stock machine' thing wouldn't it?


The surface cache filling would definitely benefit from knowing which faces are to be drawn in advance, in order to fill multiple at a time and cache better - although if not done carefully it would be possible to kick out textures needed during the same frame, before they get drawn (inter-frame thrashing). Another strategy could be to move a 'copy' of the surface cache filler to the outer part of the engine, iterating over the PVS a few faces per frame, round-robin, to predictively fill the cache. This would spread out the work better (instead of chugging when you look round a new corner - since many of the faces will be pre-fetched before you get there).

I also had a look at the BSP storage for two of the inputs to the surface plane math - the 3D plane array, and texinfo array. Both are indexed off each face, and combined with the view to generate the texture plane in realtime.

I was trying to figure out if there are fewer combinations of those two things, than there are faces - if so it might be worth caching the resulting planes and reusing the results between faces. e.g. if you have a wall with 10 different faces on it, but in fact made up of only 2 textures, it suggests there is 1 plane and 2 texinfos shared across 10 faces, and therefore a lot of reuse.

A quick test with this idea using just-planes or just-texinfos didn't appear to work so either I did it wrong or it has to be done with pair-patterns. When I get more time I'll try it.

In the end that may not work - the Carmack/Abrash surface rasterization algorithm has the capability to rasterize non-convex and even disassociated faces (one 'face' can be made up of separated primitives, but using a single edgelist). This suggests the optimization could have been done offline already, and attempting it here is futile. But I'll soon find out if that's the case - perhaps they avoided dealing with disassociated faces for other reasons elsewhere in the engine, in which case I could take advantage of it.

No real coding time ATM but still time to think about stuff :)

User avatar
exxos
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4933
Joined: Fri Mar 28, 2003 8:36 pm
Location: England
Contact:

Re: Quake 2 on Falcon030

Postby exxos » Fri Nov 14, 2014 11:46 am

Sounds some awesome progress there :)

I know the feeling about "what works best". I started to look at problems like that like "if its hard to do , its probably not worth doing" A good Homer saying I think ;)
4MB STFM 1.44 FD- VELOCE+ 020 STE - Falcon 030 CT60 - Atari 2600 - Atari 7800 - Gigafile - SD Floppy Emulator - PeST - various clutter

http://www.exxoshost.co.uk/atari/ All my hardware guides - mods - games - STOS
http://www.exxoshost.co.uk/atari/last/storenew/ - All my hardware mods for sale - Please help support by making a purchase.
http://ataristeven.exxoshost.co.uk/Steem.htm Latest Steem Emulator

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Fri Nov 14, 2014 11:54 am

CiH wrote:And it's still faster than Lasers and Men on a stock Falcon! :mrgreen:


Yeah OLAM was indeed a bit chuggy for what it was doing - but OTOH they did get a game out for Falcon which is a feat in itself :)

viking272 wrote:Not sure if this has been seen by some of you...
It's an interesting article on Wolfenstein 3D iPhone dev by John Carmack - who did it himself. He also touches on Quake 3 for the iPhone and changing tic rates. :)


Haven't had time to read it all yet, but is bookmarked :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Fri Nov 14, 2014 11:59 am

exxos wrote:I know the feeling about "what works best". I started to look at problems like that like "if its hard to do , its probably not worth doing" A good Homer saying I think ;)


Yes value judgments on 'whats worth doing' are a big question mark on ad-hoc, retro projects like this.

My rule of thumb is: 'if its going to be hard to do, it had better work'. So a bunch of useful things probably still disappear in the gaps. :)

User avatar
exxos
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4933
Joined: Fri Mar 28, 2003 8:36 pm
Location: England
Contact:

Re: Quake 2 on Falcon030

Postby exxos » Fri Nov 14, 2014 12:05 pm

I often end up in situations that when ive done something, I think, if I had done it "that way" it would have been a lot better. Though its learning how things work as you going along which is mostly the problem, so "the other way" does not appear until you finish it "the first way" :lol: A lot of the time I start over and start doing things better, but its like starting the entire thing over again. Most of the time is just results in a whole long list of projects which simply never get finished. Probably better to finish the thing first, as can always make optimizations at a later date anyway :)
4MB STFM 1.44 FD- VELOCE+ 020 STE - Falcon 030 CT60 - Atari 2600 - Atari 7800 - Gigafile - SD Floppy Emulator - PeST - various clutter

http://www.exxoshost.co.uk/atari/ All my hardware guides - mods - games - STOS
http://www.exxoshost.co.uk/atari/last/storenew/ - All my hardware mods for sale - Please help support by making a purchase.
http://ataristeven.exxoshost.co.uk/Steem.htm Latest Steem Emulator

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Fri Nov 14, 2014 12:16 pm

exxos wrote:I often end up in situations that when ive done something, I think, if I had done it "that way" it would have been a lot better. Though its learning how things work as you going along which is mostly the problem, so "the other way" does not appear until you finish it "the first way" :lol: A lot of the time I start over and start doing things better, but its like starting the entire thing over again. Most of the time is just results in a whole long list of projects which simply never get finished. Probably better to finish the thing first, as can always make optimizations at a later date anyway :)


Yes that's exactly true. That's also the mantra of the games industry. Better ship something mediochre-to-good than something great which isn't finished, or causes your investors to implode. :D The few that manage to do both keep the others trying...

For this project, I don't aim for anything specific - mainly the process and challenges that interest me. I don't have such a burden :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sun Nov 16, 2014 6:41 am

dml wrote:...

I was trying to figure out if there are fewer combinations of those two things, than there are faces - if so it might be worth caching the resulting planes and reusing the results between faces. e.g. if you have a wall with 10 different faces on it, but in fact made up of only 2 textures, it suggests there is 1 plane and 2 texinfos shared across 10 faces, and therefore a lot of reuse.

A quick test with this idea using just-planes or just-texinfos didn't appear to work so either I did it wrong or it has to be done with pair-patterns. When I get more time I'll try it.


So I figured out what's going on with this - should have been obvious really :)

In fact there is significant sharing between planes and texinfo records, in association with faces. Counting pair-patterns in the scene, I was measuring something like 30% uniqueness, or 2 out of 3 pairs being duplicates. So it should be possible to cache surface plane calculations between faces and save some cycles.

The reason my hacky tests did not work, is because each face has a unique component - the texture uv offset - because textures are not tiled (unlike Doom) in order for each to support a unique lightmap.

So for caching of calculations to make sense, the offset needs separated from the plane and recombined before use on each face. This reduces the effectiveness of caching slightly - but at the very least it leaves some multiplies and adds to be done per face - matrices and divides are part of the plane calculation only and can get cached. So it does seem worthwhile especially while the FPU is involved and is taking a lot of time.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Sun Nov 16, 2014 11:37 am

I'm going to try to set a goal for the next version - 30% increase in speed across the board (25% or better increase in fillrate performance, and 75% or better increase in per-face performance, with not much change in the rest of the engine - BSP etc.). This should bring it closer to a playable sort of speed with less variation, at least on the lower density maps.

I'll also try to clean up some of the glitching and sparkles where polygons join, and corruption of textures at very tight viewing angles.

This to me seems like enough of a gain to be worthwhile, but still realistic enough to spend time on it.

I also looked into object mesh drawing and I think it probably makes sense to avoid texturing on those since the faces will be many and small - the per-face overhead isn't worth the benefit of texturing, at least for the majority of models at the majority of encountered distances. Face colour approximation should be enough (even that is not as simple as it sounds because faces need rasterized to estimate their colour from the correct part of the master texture skin).

If textures are needed for models at point-blank range then it can always be added as a special case since those polygons will contribute to more screen area. It's just very wasteful to do texture plane setup for faces made up of a few pixels.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Postby dml » Mon Nov 17, 2014 12:54 pm

With some more hacking around I was able to prove the texture plane caching idea does actually work and is useful - so I have a C version now which has been rearranged to do that and it works well.

The calculation has been split in two, with about 60% of the work cached between faces (matrix, dots, divide), and 40% performed for every face (matrix, dots). The cache hit rate is about 60%, so overall I guess it saves something like 30-40% work for that area. It might be possible to optimize this a bit further by rearranging the calculations even the current arrangement would be a decent gain.

The code itself needs to be turned into asm for real gains but thats a separate thing. Might have time for that by friday evening.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 8 guests