Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: exxos, simonsunnyboy, Mug UK, Zorro 2, Moderator Team

f030
Atari User
Atari User
Posts: 41
Joined: Wed Dec 07, 2011 1:46 pm

Re: Bad Mood : Falcon030 'Doom'

Postby f030 » Thu Feb 14, 2013 6:06 pm

now is better, everything is readable
texcache appears after 20 seconds with these numbers: 0.51% 0.62ms and changed its name to projwallo
and remains so even after 15 minutes with the same numbers
Do you need some more info?

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Thu Feb 14, 2013 6:15 pm

f030 wrote:now is better, everything is readable
texcache appears after 20 seconds with these numbers: 0.51% 0.62ms and changed its name to projwallo
and remains so even after 15 minutes with the same numbers
Do you need some more info?


Yes it would be good to have the figures from the last column (milliseconds) on a 16mhz system with fastram with the names assigned, and then same again without fastram to compare.

The first 9 rows will be enough. The figures will wobble a bit but dont need to be exact, just enough to compare roughly.

Make sure the names stop moving around and settle down into their positions before taking any values.

should look something like this:

Code: Select all

renderflats   54.45ms
renderwalls   21.05ms
profiling   18.18ms
addwallseg   15.96ms
bldssector   15.45ms
getssector   8.68ms
nodeincone   6.30ms
perptest   2.83ms
bspwalk      2.67ms

f030
Atari User
Atari User
Posts: 41
Joined: Wed Dec 07, 2011 1:46 pm

Re: Bad Mood : Falcon030 'Doom'

Postby f030 » Thu Feb 14, 2013 7:45 pm

fastram:

renderflats 47.24ms
renderwalls 18.44ms
profiling 13.60ms
addwallseg 10.81ms
bldssector 7.84ms
getssector 7.95ms
nodeincone 6.13ms
perptest 2.57ms
bspwalk 1.76ms

without:

renderflats 48.15ms
renderwalls 19.48ms
profiling 15.56ms
addwallseg 14.50ms
bldssector 13.57ms
getssector 8.40ms
nodeincone 5.70ms
perptest 3.27ms
bspwalk 2.32ms

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Thu Feb 14, 2013 8:07 pm

f030 wrote:fastram:


Ok that's great. These two are the ones needing cache-optimized (and are the ones I was focusing on, but hadn't finished):

addwallseg 14.50ms -> 10.81ms
bldssector 13.57ms -> 7.84ms


Thanks again.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 15, 2013 11:00 am

More notes from last night...

I measured a small but real speed advantage on 68030 by replacing Devpac 'even' alignment directives (2-byte alignment) with instances of the macro below. Note: No such effect was measurable in Hatari, possibly because of the lack of data cache but perhaps for other reasons also)

Code: Select all


use_alignment
alignment      =   4

align         macro
              ifd           use_alignment
.pos\@        =             *
.pad\@        =             (alignment-((((.pos\@)-1)&(alignment-1))+1))
              dcb.b         (.pad\@),$AA  ; usually disassembles as 'linea $aaa' - easy recognition
              elseif
              even
              endc
              endm


txtlong       macro
              align
              text
              align
              endm

bsslong       macro
              align
              bss
              align
              endm

datlong       macro
              align
              data
              align
              endm





The txtlong/datlong/bsslong macros were placed just before some long-sized writable structures and long-distance function entrypoints, and before functions which contain high performance loops and don't quite completely fit in the i-cache (since fewer cache entries will be straddled with better alignment). They are also used when alternating between sections for table generation etc.

It might be possible to create a macro which aligns the start of a function, based on a forward reference to a specific location elsewhere in the function, so you can 'mis-align' the function start, such that a critical loop ends exactly at the end of a 4- or 16-byte cache entry or line. This would probably be optimal for high performance loops, or for large loops - especially if the last instruction of the outer loop freezes the i-cache before the 2nd iteration - preventing i-cache thrashing and ensuring the whole cache is used effectively for at least the majority of the loop. I haven't tried this but it's worth a look if Devpac can cope with the forward reference.

I suspect alignment of 16 would also help on 68040/68060 for similar reasons (but have not tested it with this program).

Will go through the code again later and align more material, and add some runtime alignment checks on critical structures to make sure everything is as expected.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 15, 2013 12:58 pm

Actual figures collected from tests on two CPU-intensive functions with (A) forced long-alignment of data and within-function stack pointer register, (B) with just aligned data, and (C) without alignment. Times in milliseconds. Clearly 4-byte alignment helps even on a 16bit bus, if used with care and properly verified around hotspots with self-test macros.

Code: Select all

             data+sp   data      none
bldssector   16.4      16.6      17.0   
addwallseg   15.7      15.9      16.0

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 15, 2013 6:20 pm

A summary of changes & status for BadMood engine over the last month or so - I did a quick review of my code changesets before starting the next pass...


- scene displaylist batching functional, except for:
--> sprites - disabled
--> addwallseg to be optimised for batching
--> bldssector to be broken up & optimised for batching
- direct DSP texturing
- texture stream test/measurement (1 texture)
- mipmap rendering
- mipmap colour reduction & error diffusion
- FPU perspective-correct test
- linear/perspective wall mode bugfix
- data alignment & verification
- SP alignment in BSPwalk/bldssector
- DSP scene init/height bugfix
- transparent wall bugfix
- profiler bugfixes, more profiling
- profiling absolute times
- more metrics
- window size indicator on resize
- VGA scanline adjust fixed for default mode
- some self-debugging tools

temporarily broken (all related to floor/ceiling changes, will be restored gradually):
- sprites
- floor/ceiling lighting
- sky
- turbulent flats (lava, water etc..) - needs a new DSP shader, probably will be on hold for a while
- texture state changes
- brickbat/double-pixel rendering

other todo:
- remove temporary FPU requirement
- implement compressed texture streaming
- rework walls
- redo sprite & transparency rendering stuff from scratch
- DSP wall/floor code temporarily de-optimized in critical areas, needs redone
- new video mode(s) for optimal bus utilization
- cache control
- break up engine into core and non-core (i.e. 'wad viewer') material for producing library -> Doom game binding without breaking anything

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 15, 2013 8:41 pm

The next task will be sorting out the visplane renderstate group deferral inside the DSP so multiple textures can be used again, and implementing the texture streaming for that, and some other things related to it, like the broken sky and sector lighting fx.

This is fiddly and will take a while. Won't be much else happening before these things are fixed.

zzzzzzz Yellow_Colorz_PDT_26

Ato
Captain Atari
Captain Atari
Posts: 300
Joined: Tue Aug 10, 2010 3:27 am
Location: Duisburg, Germany

Re: Bad Mood : Falcon030 'Doom'

Postby Ato » Sun Feb 17, 2013 4:06 pm

DML:
Thanks a lot for your very interesting blog!

:cheers: :cheers:

dml wrote:I measured a small but real speed advantage on 68030 by replacing Devpac 'even' alignment directives (2-byte alignment) with instances of the macro below. Note: No such effect was measurable in Hatari, possibly because of the lack of data cache but perhaps for other reasons also)


I can imagine that you hit an optimisation path that the 68k did not have. The MC68030 seems to "prefer" to have its data aligned to what matches the operand size of the instruction Here is what I found in the M68030 User's Manual:

Notice that the MC68030 does not require data to be aligned on word boundaries (refer to Figure 2-2), but the most efficient data transfers occur when data is aligned on the same byte boundary as its operand size.


Luckily Devpac 3 has an assembler directive for that:

cnop offset, alignment

where offset is the number of padding bytes and the other one represents the alignment in bytes (1, 2, 4), e.g.

cnop 0,4

will instruct the assembler to align the PC to the next long-word relative to the start of the current section (text, data, bss). So in my understanding, it would make sense to start long-word data after a cnop 0,4 and word data after cnop 0,2.

Hth. Cheers,
T.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sun Feb 17, 2013 6:58 pm

Ato wrote:DML:
Thanks a lot for your very interesting blog!


It has become a bit of a blog yes :-) wasn't really part of the plan but has probably found the right home, with other Atari coders around.

Ato wrote:I can imagine that you hit an optimisation path that the 68k did not have. The MC68030 seems to "prefer" to have its data aligned to what matches the operand size of the instruction Here is what I found in the M68030 User's Manual:


Interesting - I knew about some effects on cache boundaries but not the sensitivity to operand size with alignment. I should have a look to see if there's some way to take advantage of it without lots of effort (like a pairing approach which keeps 4-byte ops together or multiples of 4 apart)

Ato wrote:Luckily Devpac 3 has an assembler directive for that:
cnop offset, alignment


Ok I think this completely replaces my macro then. I should really go through the manual properly one day and find all these gems instead of taking the long route :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sun Feb 17, 2013 8:56 pm

For what it's worth... Doom2 running in Hatari/Falcon030 mode. At about 1fps!

Still working on more important things but it's still funny to see it going with so few changes to the source.

f030ld.jpg
You do not have the required permissions to view the files attached to this post.

Ato
Captain Atari
Captain Atari
Posts: 300
Joined: Tue Aug 10, 2010 3:27 am
Location: Duisburg, Germany

Re: Bad Mood : Falcon030 'Doom'

Postby Ato » Sun Feb 17, 2013 10:46 pm

dml wrote:For what it's worth... Doom2 running in Hatari/Falcon030 mode. At about 1fps!


Awesome! :cheers: :cheers:

FedePede04
Atari Super Hero
Atari Super Hero
Posts: 951
Joined: Fri Feb 04, 2011 12:14 am
Location: Denmark
Contact:

Re: Bad Mood : Falcon030 'Doom'

Postby FedePede04 » Sun Feb 17, 2013 10:51 pm

Damn you are good :D
Atari will rule the world, long after man has disappeared

sometime my English is a little weird, Google translate is my best friend :)

Zamuel_a
Atari God
Atari God
Posts: 1220
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Bad Mood : Falcon030 'Doom'

Postby Zamuel_a » Mon Feb 18, 2013 12:22 pm

How many FPS do you think would be possible to get at the end on a stock Falcon? Maybe it's very hard to calculate that at the moment. Do you run at 320x240 or 160x240 resolution? From what I can see on the Jaguar, they run DOOM at 160x240 which feels alittle low, but maybe it wasn't possible to do it faster on that machine.
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Mon Feb 18, 2013 1:22 pm

Zamuel_a wrote:How many FPS do you think would be possible to get at the end on a stock Falcon?


It's still quite difficult to estimate because many changes are still needed and eventually there will be 'options' available for additional speed at some loss of image quality (either resolution, or texture detail, disabling mipmaps, depth cue fx etc...) but I'm trying to treat these 'options' as a separate thing and focusing on maximum possible framerate with default settings - with everything turned on.

The levels also make things difficult - some scenes run smoothly and others not so good. Currently there are obvious performance problems with complex scenes caused by upper/mid/lower wall processing (i.e. not due to pixels needing filled or texture count used - which recently have been near-constant costs). This is probably the main limiting factor just now for a 'playable' game because it affects performance even with a small window. After this part has been improved it may be easier to guess the end result.

Converting the floors/ceilings to use DSP textures has added a new cost - currently around 3.8ms per unique floor texture on-screen. This is way too much and with BSP-order drawing is more than cancelling the benefit of using the DSP for that job - but the per-texture cost will probably be cut in half with texture compression and the number of texture uploads reduced from around 15-20 (without renderstate deferral) to about 4-8 (with deferral). So around 15ms peak for texture transfers, which is a good chunk of a vblank :-z - still quite a bit faster than the old version though.

Because of of the texture upload cost mounting up in very complex levels, I'll probably keep a 2nd 'hybrid' floor rendering method which doesn't require texture uploads and shares duty with the CPU - lower fill rate but constant cost with unlimited texture count.

Zamuel_a wrote:Maybe it's very hard to calculate that at the moment. Do you run at 320x240 or 160x240 resolution? From what I can see on the Jaguar, they run DOOM at 160x240 which feels alittle low, but maybe it wasn't possible to do it faster on that machine.


The default resolution is currently 320x168 for the 3D view, which leaves room for the status bar at the bottom (320x200 including status bar) - the same as the original PC Doom. FPS measurements so far have been done in this resolution - however this is not great for a stock Falcon even after all optimizations are made. There's just too much processing going on and I expect the C code for the game itself is going to suck more time away too.

BTW the Jag version was 160x180 lines for the 3D view, and 40 lines for the status bar (220 lines total).

I speculate that somewhere around 256x168 for the 3D view (200 lines total) using special Falcon video modes (double pixels + L/R overscan) will allow a decent framerate in the end, providing the Doom game code isn't too greedy and doesn't need a ton of optimization applied to it. Worst case it will be like the Jag version - i.e. 160x180 or something in that area.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Tue Feb 19, 2013 11:05 am

Late last night I got renderstate/texture group sorting, texture streaming and texture compression working - so the view looks nearly correct again.

However there are still a few problems.

Lighting does not work yet because the DSP spanbuffer retained only enough information until the endscene event to transform the floor texture coordinates (x1,z), having sent the rest of the info (x2,luma) back to the CPU on each ssector fetch earlier during the BSP walk. This has caused some problems - I had to change the spanbuffer format to pack the [luma:x2:x1] data into one word so everything can be kept on the DSP until endscene without bloating the spanbuffer. It does mean the ssector fetch may be simplified to just tracking position of the spanbuffer for drawing sync later, instead of extracting per-span stuff. This might make it faster and/or more concurrent generally when fixed since the getssector routine can just kickstart the DSP defragmentation process for a section of floor without waiting for any results...

The performance obviously isn't as good as the simple testcase with one texture - it is roughly what I predicted (around 10-15ms for all texture uploads) so no real surprises, but it is a constant overhead for any window size - so it increasingly offsets some of the savings with DSP texturing as the window is shrunk. The benefits are mainly seen with a larger window. That might be fine since the aim is to have 256x168 - but it would be nice to kill some of the overhead without hacking out effects or texture quality. There is still some room to do things but it's probably running up against limits now for this approach while retaining all the truecolour lighting levels. One of the options was to burn depth-cue lighting into the mipmaps instead of transmitting a complete 32-level lighting table/palette with each texture, and reduce that to 4 or 8 levels for sector-level effects only but this means a bigger palette per texture, and reverting the texture compression from 3 texels per word, back to 2 - so perhaps not much gain to be had in the end, in exchange for quite a lot of trouble.

I'll compare the hybrid version to see how they both scale so one or other can be chosen as the best default.

Having said all of that, it is faster now than before so it was worth the effort - and the primary cost is now walls again for anything but the simplest view. So focus will move to ssector processing and wall generation/rendering next.

Will be busy for the rest of the week but may have something viewable again before the weekend.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Wed Feb 20, 2013 11:51 pm

Tonight I did some profiling of BadMood using the Hatari debugger/profiler, after making a few experimental modifications to the profiler data collection and generating views on the exported data in Excel (if you read the Hatari developer mailing list you'll have seen something on this). Generally I have found the Hatari profiler to be a fantastic tool for DSP work - it's possible to see some things happening which are difficult to figure out using real hardware.

The short story is, I was able to reliably locate all the sites in the DSP code where the DSP wastes time waiting for the CPU, and a few cases where the CPU waits for the DSP. I was also able to find some wasteful operations, code which is causing penalties because of the location of various things on the DSP, code which is used a bit more than I expected, code which is larger than I expected and so on... basically a bunch of things which can be improved and speeded up.

It's also clear that the DSP isn't really a bottleneck most of the time - which might be more good news for further optimization.

Need to be a bit careful though - much of this is time critical data and Hatari's timings don't quite match real hardware, mainly for the CPU but I suspect also for the HOST port overhead between the two devices. This still needs checked separately. For the DSP on it's own the timings are very very close though - almost exact.

...a couple of the Excel views made from the DSP profiling data, collected from a 5-second run. The coloured columns are activity, time, cycle, memory penalty hotspots. The last column highlights transmit/receive bottlenecks - either read or write. All useful stuff.

Image

Image

Dal
Administrator
Administrator
Posts: 4079
Joined: Tue Jan 18, 2011 12:31 am
Location: Cheltenham, UK
Contact:

Re: Bad Mood : Falcon030 'Doom'

Postby Dal » Thu Feb 21, 2013 10:01 am

I am in awe of the level to which you are taking this project. Truly impressive work - 2013 seems to be another great year for Atari. Thank you! :)
TT030: 4MB/16MB + Crazy Dots, Mega"SST" 12, MegaSTE, STE: Desktopper case, IDE interface, UltraSatan (8GB + 512Mb) + HXC floppy emulator. Plus some STE's/STFM's

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Thu Feb 21, 2013 11:10 am

Dal wrote:I am in awe of the level to which you are taking this project. Truly impressive work - 2013 seems to be another great year for Atari. Thank you! :)


:cheers:

It's fun to work on and a way to hone some skills. Hopefully somebody else out there will pick something up along the way which will lead to more Atari/Falcon stuff appearing.

BTW is there such a thing as a DSP tutorial on the forum? When I first returned to this project I found lots of things in my code that I didn't quite understand the reason for, then remembered why it's like that - and it might be useful if some of those things get documented for others. I don't know how many people are actually trying to write DSP stuff out there but I imagine the learning curve is a bit steep for a beginner. Anima started a good thread on tools and cross-compiling code but I wasn't able to find much on actual programming & code examples (at least not so far).

I might start a simple scratchpad thread for DSP coding/optimisation notes. A proper tutorial is probably a bit too much for me right now but random notes and code fragments is one way to let others see how things work in context. It might also be good for BM if others who have written DSP stuff can drop some of their own tips I have probably missed myself :-)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 22, 2013 10:52 am

So the last thing I did was update the profiler Excel sheet with some rules to count references to variables, buffers & constants stored in both internal and external DSP memory.

One rule looks at the program extension word column which tends only to be used for addresses or long immediates/constants - the other rule looks at the disassembly text like x:$<addr> etc.

Since the profiler output contains hit counts for every address/opcode as well, I added a rule to sum the hitcounts for each reference found. So I now know (A) the number of references made to each variable in the code and (B) how many times each variable reference is actually executed at runtime.

Image

This is useful because the location of variables (etc) determines short or long addressing (instruction size = cycle time), and the possibility of parallel moves. This translates to code size and performance. It's also useful for hunting for duplicate constants e.g. where $008000 happens to be declared as one of several shift multipliers, but also a mask used for combining a 16-bit sign. etc. etc. If the same constant is declared several times with different forms and executed 10's of thousands of times or more, it's worth aliasing a single declaration to several different names, and placing it in fast memory.

e.g. this single constant has two aliases so it can be used for shifting left 6 bits or for shifting right 18 bits (24-6=18) depending on how the result is extracted from the accumulator. The high activity rate in [] brackets shows it's best kept in the first 64 words of fast memory so it costs nothing to access.
rshft18:
lshft6: dc (1<<(6-1)) ; [020100]


The exercise has been quite revealing - it has shown that some variables have been placed inappropriately either in 'fast' internal ram, or external ram. i.e. some items were placed as 'blocks' according to association and use in various routines - not by their individual performance characteristics. Since there is a shortage of 'fast' memory (addresses 0-63 are the fastest, 0-255 are next, and everything else above that is long addressing and bus conflicts = can be slow) placement of frequently used things is key for speed.

In the snippet below, the [P:Q] values mean... P=number of times reference was executed, Q=number of references in the code. There are a few variables in there with a higher incidence in the code itself, but very low execution activity - they don't deserve to be in 'fast' memory. Now it's easy to see where things should go.

Code: Select all

;-----------------------------------------------------------------------*
;   Quick immediate pointers and addresses            *
;-----------------------------------------------------------------------*

c_HTX_ptr:         dc   HTX            ; [039404:14] host port
c_imvpedgearray_x1_ptr:      dc   imvpedgearray_x1      ; [008876:03] left edge array
c_imvprun_lastslots_ptr:   dc   imvprun_lastslots      ; [016905:05] linked run last-slot addresses
c_imvprun_data_c_ptr:      dc   (imvprun_cache+imvprun_data_c)   ; [000387:02] ceil. run buffer (runs)
c_imvprun_link_c_ptr:      dc   (imvprun_cache+imvprun_link_c)   ; [000387:02] ceil. run buffer (links)
c_imvprun_data_f_ptr:      dc   (imvprun_cache+imvprun_data_f)   ; [000347:02] floor run buffer (runs)
c_imvprun_link_f_ptr:      dc   (imvprun_cache+imvprun_link_f)   ; [000347:02] floor run buffer (links)
c_x1list_ptr:         dc   x1list            ; [000476:01] run x1 merge-list
c_x2list_ptr:         dc   x2list            ; [000476:01] run x2 merge-list
c_oslist_ptr:         dc   oslist            ; [022711:01] run offset list
c_tlist_ptr:         dc   tlist            ; [011593:01] run tracking list


I also did some other stuff with naming which has led to optimizations but I'll detail that another time, it's difficult to summarize quickly.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Feb 22, 2013 6:29 pm

Fixed the simple sky/invalid-texture detect bug that was preventing Doom2 and Ultimate Doom WADs from working - so I can start using some of the horrendously complex level data from UD for performance testing. Like the one below, which does work but is very slow due to many visible surfaces and lack of effective occlusion.

Image

I'll probably just stop using e1m1 for reference (except for large improvements) and concentrate on this nasty stuff as small speedups can be detected much more easily.

The DSP texture streaming can be seen working there but without any lighting, so floors are fixed at 50% intensity until that's fixed.

Still working on DSP code, no worthwhile updates as yet.

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Bad Mood : Falcon030 'Doom'

Postby kristjanga » Fri Feb 22, 2013 11:12 pm

dude this looks so nice :D
what frame rate are you getting on this?

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Feb 23, 2013 10:16 am

kristjanga wrote:dude this looks so nice :D
what frame rate are you getting on this?


Pretty slow - around 1.6fps on BM307 (with sprites) and 3fps in the current version (no sprites). This is with the default view size.

[EDIT] ...turns out this test was done with full DSP handshaking enabled, so the actual speed is probably a bit higher than 3fps. Not checked it yet.

I would need to rebuild 307 with sprites off to compare properly but there's something like a 60% speed improvement now over the old code.

Things should get better for scenes like this one as optimisations go in, although simpler scenes may not improve so much. Gameplay would be more affected by framerate variability than maximum framerate (not good if you open a door at a difficult point in the game and it suddenly crawls because of the view), so it's better to focus on improving performance with a range of difficult scenes.


The main bottleneck now is adding new walls a section at a time (upper, middle, lower), on both the CPU and DSP side. This bit area need quite a lot of work and take some time to change but should be worthwhile.
Last edited by dml on Sat Feb 23, 2013 1:21 pm, edited 1 time in total.

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1558
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Feb 23, 2013 10:46 am

dml wrote:
kristjanga wrote:dude this looks so nice :D
what frame rate are you getting on this?


Pretty slow - around 1.6fps on BM307 (with sprites)..


Could you add here your latest Bad mood binary *with* CPU & DSP symbols for Hatari?

I would like to try it with my latest Hatari profiler code. I've just added preliminary [1] cost propagation code to the profile data post-processor and I'd like to take a look at what the callgraphs for more complex code look like with it.

[1] slow & inefficient (= exponential) Python code. I will improve it later.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Feb 23, 2013 12:30 pm

Eero Tamminen wrote:Could you add here your latest Bad mood binary *with* CPU & DSP symbols for Hatari?

I would like to try it with my latest Hatari profiler code. I've just added preliminary [1] cost propagation code to the profile data post-processor and I'd like to take a look at what the callgraphs for more complex code look like with it.


Ok next time I rebuild I'll post the binary + LOD + LST files for the symbols.

The CPU side isn't very complex (compared with some medium-sized C projects) but the DSP side is quite large and not very typical so it may be useful for testing.

(So far I found the DSP side of Hatari debugger & profiler to be quite reliable and accurate, although I have not spent much time with DSP symbols yet).


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 2 guests