Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Bad Mood : Falcon030 'Doom'

Post by kristjanga »

you are the man Douglas
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

I added some visualization improvements to Hatari profile post-processor (will be commited to repo later today).

Only subroutine calls still use normal arrows, branches & jumps are indicated by a circle at the end of the line and trap/exception invoking is indicated with dashed lines. Them being indicated differently hopefully makes the graph more readable.

See the attached graph of earlier Bad Mood CPU code for an example.

If one chooses to disable call instruction based callgraph filtering, one can see also few additional styles used. Subroutine & exception returns have inverted arrow head and calls with non-categorized instructions are shown with dotted lines (+ diamond at callee end of the line).

If I get list of all relevant DSP instruction opcodes, I'll add similar support also for DSP code.
You do not have the required permissions to view the files attached to this post.
User avatar
Stefan jL
Atari God
Atari God
Posts: 1316
Joined: Thu May 09, 2002 3:21 pm
Location: Sweden

Re: Bad Mood : Falcon030 'Doom'

Post by Stefan jL »

Now i am not a programmer but why the focus on converting the MUS files from Doom to SMF? Is it not the best to actually use the original MUS files for BM by studying the original Doom source code and see how it was done for PC?

Having midi support was very common for PC games but very few actually used SMF format... they usually had more flexible formats that was better optimised for PC games, like MUS in this case.

Barry Leitch did PC midi music using a tracker inteface (based on Octamed) Thats very far from SMF :) http://www.youtube.com/watch?v=Pc2Xowj22hU
Image
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Stefan jL wrote:Now i am not a programmer but why the focus on converting the MUS files from Doom to SMF? Is it not the best to actually use the original MUS files for BM by studying the original Doom source code and see how it was done for PC?
The MUS playing code isn't in the Doom source - it was a separate soundcard layer IIRC which had a different implementation for each type of card, and it was external to Doom itself, a sort of driver. Although I'm fuzzy on the details now - long time ago. Will look at the music related code again before doing the Atari side.


In fact I'd rather not have to convert MUS files to MIDI (or anything else) by hand, but have the BM resource cache do it as it loads the WAD, so it happens invisibly. MIDI is attractive because it's close to MUS, and BM can play MIDI already.

That's the current rationale - although there are certainly other options open.
Stefan jL wrote: Barry Leitch did PC midi music using a tracker inteface (based on Octamed) Thats very far from SMF :) http://www.youtube.com/watch?v=Pc2Xowj22hU
I remember Barry's work :) although I only vaguely remember the Octamed tracker...

The aim is really just to have BM play music from any IWAD or PWAD already circulating, and that includes MUS and MIDI formats so if MUS-MIDI can be done behind the scenes only MIDI playing is needed and for native output only a sample mixer is needed to output to the CODEC. If MUS/MIDI becomes too much trouble some other method can be used but it *might* work out ok.
User avatar
Stefan jL
Atari God
Atari God
Posts: 1316
Joined: Thu May 09, 2002 3:21 pm
Location: Sweden

Re: Bad Mood : Falcon030 'Doom'

Post by Stefan jL »

dml wrote:
The MUS playing code isn't in the Doom source -
Oh.. i did not know that.
I did some googling and found this: http://doomwiki.org/wiki/DMX It appears Heretic source code might have the music code?

Seems the MUS format was limited to 9 channels because of the Adlib (not a midi device but used the same music files).
http://doomwiki.org/wiki/MUS
Image
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Stefan jL wrote:Oh.. i did not know that.
I did some googling and found this: http://doomwiki.org/wiki/DMX It appears Heretic source code might have the music code?
Hehehe - that made good reading :)

Yes looks like maybe DMX code is included with Heretic, although I haven't looked at Heretic at all. So that's another option if the MUS part is relatively standalone and ports well, or is clear enough to be translated.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:In fact I'd rather not have to convert MUS files to MIDI (or anything else) by hand, but have the BM resource cache do it as it loads the WAD, so it happens invisibly. MIDI is attractive because it's close to MUS, and BM can play MIDI already.
And the reason MIDI is attractive is Atari having MIDI ports built-in and MIDI playback taking much less CPU than e.g. MOD music with the same amount of channels...

As I've understood MUS is a subset of MIDI. I would hazard a guess that it supports only single instrument per track whereas with MIDI files, events on a track can be for any channel and instruments bound to channels can change during track.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Yes looks like maybe DMX code is included with Heretic, although I haven't looked at Heretic at all. So that's another option if the MUS part is relatively standalone and ports well, or is clear enough to be translated.
That wiki page says that DMX (library) code isn't included with Heretic, only that Raven includes their DMX related code with Heretic. I'm not sure how useful those "related" parts would be...
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Eero Tamminen wrote:If one chooses to disable call instruction based callgraph filtering, one can see also few additional styles used. Subroutine & exception returns have inverted arrow head and calls with non-categorized instructions are shown with dotted lines (+ diamond at callee end of the line).

If I get list of all relevant DSP instruction opcodes, I'll add similar support also for DSP code.
Thanks to Laurent's list of opcodes, I could add support for this.

Attached is a graph of DSP code with different arrows to indicate different "call" types. There are a couple of nodes where one can see returns going to named address (middle/bottom of graph). Does it look sane?
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

The graph does look sensible but I'll need more time with it to figure out if the call type indication is working properly. I'll let you know when I've done that.

Aside: I recently lost a drive in one of my (non-Atari) machines which had a heap of backups and old stuff on it. I thought it was a lost cause - the partition and directory areas were peppered with bad sectors - but I have been able to piece the drive back together again using some good tools. Between this and other things happening this month I've been unable to do any coding. Should clear soon and I'll be able to do some work on BM again.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Have restarted this after a slew of random problems from all directions. Progress slow just now but will pick up again.

Working on better buffering of map vertex data etc. into the DSP, reworking the perspective mapping to use less code and math, trying a reciprocal cache for redundant z-divides within the same scene (not really needed, but interesting) - and some other stuff related to these areas.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

I tried a little DSP experiment using a reciprocal cache (after reading an old paper on this trick). It's basically equivalent to a (1/x) lookup table for the full range of (x), but with a limited hit rate - divisions still need to be performed to fill cache misses. Hits on reciprocals are considerably more frequent than full (x/y) pairs and storage is much lower.

The paper suggests that optimal returns vs storage can be found with as little as 128 entries, so I decided to use 256.

The implementation is not highly optimized because I don't want to trash too many registers, consume 'fast' memory or use excessive program space. More could be done with it if those things can be afforded. However it manages a cache hash, tag-test & retrieve in 18 clock cycles (9 'nops') and a miss in considerably more (18 for cache query overhead + 64 for conditional divide subroutine incl. return = 82 total). A typical divide takes 56 for ccr + rep + 24 iterations + extract. So there needs to be a 44% hit rate to break even.

Testing this in the floor perspective calc instead of the existing 56-cycle divide yields 27084 cache queries and 10484 misses (where a full divide is required to fill the cache on each miss). That's a 37% miss rate, or 63% hit rate.

In terms of speedup, that's (27084*56)=1,516,704 cycles for a real divide, vs (10484*82)+(16600*18)=1,158,488 cycles for the cache version, or a 24% saving in cycles.

Cool :-)

This is mainly possible because floor perspective division has obvious redundancy - spans from different floor/ceiling surfaces share the same z, even if they are processed at separate times or are at different elevations. It's also not the only way to minimise divides for this task (edge adjacency lists would do that for floors) but it's very low on storage and can be shared across lots of tasks (walls, floors, seg vertices) which makes it interesting.

With further changes to make in BM, it may not keep its value but it's another thing to think about if you have a lot of divides to do and want high resolution results but lack space for exhaustive tables. You can decide how much space you can afford and use that, without trading precision. (With a few kB of cache you can effectively cut reciprocal calculation time in half and maintain a 24bit result).

[EDIT]

Note that the figures above are not collected from a single frame, but a series of frames. This is why they seem large :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Finished a massive tidy up and code commit last night for v4.05 - something that was getting in the way of further progress (too many small, semi-random changes outstanding). It would normally have been done in smaller steps but time has been limited recently and committing any changes always needs careful review and comments first.

Should be able to focus on some of the more important things again!

Code: Select all

- BSPD replaced realtime WADRAM structure indexing with offline fixups (as per Doom)
- BSPD added 'trigger' line shortcut (as per Doom) - without any way to test as yet
- BSPD disabled linear/perspective paths, pending replacement with a simpler perspective path
- BSPD in-lined visibility test + other minor changes here
- BSPD out-lined prelight step on CPU, until it can be off-lined
- DOOM placed MMU30 under build switch
- DOOM deferred CACR init to CPU030 code
- DOOM added message indicating type of build (standard machine / modified system)
- CHKMOVE 'fixed' lockup bug (blockmap coords out of bounds) prevent confusion with serious DSP lockups
- CHKMOVE 'magic number' indexing/mul fix
- DSP factored transparent wall insertion into ADW_T.ASM, alongside ADW_U/M/L.ASM
- DSP fixed 2-texel overrun bug in texture upload area (words decoded to triplets, 2 unused at end of stream)
- DSP added 256-entry reciprocal cache, in use by floors only for now
- DSP started adding [segbuffer] internal buffering of segment vertices
- DSP reorganised/regrouped low/high frequency constants
- DSP fixed some well hidden qv naming/aliasing issues in ProjectNode/Line
- DSP renamed ProjectLine to R_ViewTestAddLine, adopting similar naming to Doom for 1:1 reading
- WADRAM struct changes to support offline fixups
- MACROS remove bsslong (occasional phase errors), adopt CNOP in txtlong, datlong
- IO moved offline WADRAM processing steps into fixup_ functions, called only after all modules loaded
- LIGHTING runtime-align colour tables since BSS is statically word-alignment
- MIDI minor changes to test midi playback
- SYS rename 040 memcpy code to avoid accidents
- TIMING debug display for MMU30 registers
- VIDEO added use_mmu30_cache_inhibit extension for cache-inhibited shadow memory
- globally rename screen->framebuffer_window, less ambiguous for windowed vs fullscreen objects later
- added MMU30/ subtree & code
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Finished a massive tidy up and code commit last night for v4.05
Could you provide a new build with debug symbols for CPU & DSP?

I'm assuming profiling results from the earlier binary which had debug symbols would not be of any interest anymore. :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Could you provide a new build with debug symbols for CPU & DSP?
Yes I'll do this soon - although I don't think enough has been changed to significantly alter the profiling results. A bit perhaps but the relative cost of things will be much the same.

(The next group of modifications will change that but these will take a while to finish)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

The next step is to replace the wall texture mapping with a better solution - one that doesn't need any CPU setup at all, is smaller/simpler and eliminates extra host traffic. It should also eliminate the need for linear/perspective fast path selection and the test currently needed to prime it.

The implementation is actually closer to Quake than Doom, but on the DSP56k will be more efficient than either. The main difficulty will be maintaining the precision of previous versions - but initial tests look promising (it works, but doesn't clip properly yet and is piggybacked on the old version so performance is temporarily worse).

Once this is done, the old code will be trimmed away and the end result should be a pretty good speed increase relative to wall count on both CPU and DSP.

Doing this will break the current depth-cued lighting and cause a couple of other problems but those will be fixed later.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:although I don't think enough has been changed to significantly alter the profiling results. A bit perhaps but the relative cost of things will be much the same.
Ok, thanks, in that case I'll continue debugging the subroutine call/return based cost accumulation issues in Hatari code with the old binary.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Ok, thanks, in that case I'll continue debugging the subroutine call/return based cost accumulation issues in Hatari code with the old binary.
Ok this will let me get on with the other changes. It should be more interesting to profile again afterwards - the improvements will need confirmation.

I can also disable some things next time which might get in the way of profiling. The BSP walk (data recursion with SP) might have to remain as it is but but the other alignment and stack stuff can be turned off.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

While this seems like an endless process of optimizing, experimenting, rewriting... there is a purpose in mind. I don't think BadMood was/is fast enough to run the game comfortably at the intended resolution, with everything else that will be needed (sprites, collisions, audio, AI etc - some of that in C also).

When it reaches a point where further *significant* improvements aren't realistic and/or it seems fast enough and consistent enough across many scenes, I'll abandon this mission and start breaking it up for joining with the Doom source. I could have started this already but it just seems too early while real speedups remain on the table...
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Bad Mood : Falcon030 'Doom'

Post by calimero »

dml wrote:The next step is to replace the wall texture mapping with a better solution - one that doesn't need any CPU setup at all,...
will you use "texture on DSP"' approach as Mirko suggested?
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

calimero wrote:will you use "texture on DSP"' approach as Mirko suggested?
Yes, that was done a couple of versions ago :-) For floors anyway. Wall textures are too big and numerous so a different method will be used.

(for walls a mixture of DSP + CPU, and maybe DSP-only for small, distant texture pages as a final optimization)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

So replacing all the wall texture mapping math yesterday was a successful exercise... I found it was possible to calculate column texture and also column [luma] from a single set of [pz],[puz] interpolants (using a nice trick to derive the real-z from the otherwise unsuitable perspective correction [pz] interpolant almost for free), and an accelerated reciprocal lookup for the perspective divide.

So the per-column DSP code for wall rendering attributes is half the size it was, and more than twice as fast.

This doesn't help much on it's own since the DSP was already spending a lot of time waiting around. But it does allow a lot of old per-wall CPU & DSP setup code (u-clipping, offsets etc.) to be stripped out very soon and that will help a lot.

There is one (fairly typical) new bug causing some textures to be offset wrongly, but not unexpected after so many changes. Probably a sign flipping or extension issue somewhere along the line in the original code which has been uncovered.

The column height/texture [v] part could benefit from similar treatment, with much less to gain but still worth reducing - so I'll look at that in the evening if I get time.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Good progress recently.

Replaced old texture mapping with Quake-like solution, modified for DSP fixed point math (perspective-correct attribute plane lookup in screen space). I tried this before on the DSP when I ported Quake to the Falcon/AB40 but it caused a lot of problems - Quake depends heavily on floating point dynamic range and the techniques employed there are difficult to implement properly in narrow range fixed point. This newer version is better adapted for fixed point use - it works well and is fast.

This also means a full Quake surface renderer (with cheap pixel-level clipping) can be done on the DSP using the same method (something for another day).


So having replaced the old code I was able to take out a lot of CPU-side wall setup math and this has resulted in a bit of a speedup. There is still a lot of that left though and I'm working through it as time allows.

There might be more to report after the hols.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

I got a little time to run the data-cache experiments again properly. The differences are not large but they are measurable and consistent, for what it's worth.

Tests were done on a plain Falcon without any modifications, at 50Hz on an RGB monitor using code based on BM v4.03F.

Code: Select all

write-allocate flag     cache-inhibited framebuffer       measured FPS            change vs reference
off                     off                               7.6923                  <reference>
on                      off                               7.7294                  +0.5%
on                      on                                7.7669                  +1.0% 
I should add that I didn't try cache-inhibiting some other relevant things e.g. displaylist or other sequentially written data. So it's worth trying that sometime as well and is easy now with the 'shadow bit' trick.

Some simple tasks (e.g. copying floor pixels from the DSP) are always faster with the data cache completely off - but there is zero read redundancy in that task so it's not a big surprise. It is a little surprising the data cache has some kind of fixed overhead for just having the thing enabled but it really only shows up when block filling or moving and that sort of code is easy to identify and special-case anyway. It's mainly more complex bits of code with temporary state or table indirection that benefit from these other tricks.

Not a big gain in speed and not what I should really be doing right now but somebody might find it interesting to have rough numbers.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Got a bit more done this weekend although only a couple of hours of coding time really, was holiday time after all!

From 7.7669 FPS in v4.03 to...
406snap.jpg
...in v4.06 (same testing conditions, same machine).

And with the stretchy wall bug fixed too (which had previously inflated the FPS very slightly because of texel reuse)

The code is a mess and a bunch of things are not working very well so more effort is needed to finish replacing code, repair broken stuff and remove all the old dead bits and pieces.
You do not have the required permissions to view the files attached to this post.

Return to “680x0”