Bad Mood : Falcon030 'Doom'
Moderators: Zorro 2, Moderator Team
-
- Captain Atari
- Posts: 400
- Joined: Sat Jul 25, 2009 3:35 pm
Re: Bad Mood : Falcon030 'Doom'
big LIKE [smilie=greencolorz4_pdt_01.gif]
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'

It will probably get faster still over the next few days - perhaps the same again. Depends on how soon it becomes DSP-limited and where it happens - some things are easier to move or overlap than others. Will see in a week or so.
Im still finding many optimisations or ways to simplify/rewrite practically everywhere both in BM itself and alternative methods to those used in the Doom source - so it's getting a bit frustrating not finding the end of the rope and having my mind cluttered with random unrelated improvements all the time.
It's a consequence of having many 'moving parts' and I expect it can keep going on like this for some time, beyond the changes I had planned.
May have to just stop soon and get on with the game side, for a change of scene. Other improvements can resume later and better choices might be made too.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
Fully working game is often also more interesting to test and non-BSP code in Doom might also have large performance problems of its own...dml wrote:May have to just stop soon and get on with the game side, for a change of scene. Other improvements can resume later and better choices might be made too.
-
- Atari God
- Posts: 1223
- Joined: Wed Nov 20, 2002 11:22 pm
- Location: France
Re: Bad Mood : Falcon030 'Doom'
Wow, congrats on the optimisations!
-= Personal pages hub = YM-Rockerz =-
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
So it turned out that replacing the wall texturemapping calcs in BadMood wasn't as easy as I first thought. I forgot the level of pain involved the first time round, and had to deal with all the same stuff again this time round. doh.
The texturemapping itself isn't actually the hard bit (that was working yesterday). The clipping of those walls correctly in screen space is the hard bit, and involves an awful lot of crap that has to be exactly right or the result is unacceptably broken (or wobbles like playstation polys).
It took me just one try to get the z-clipping part right. It took me *several* tries and some head scratching to get the screenspace clipping to work at all, on top of that, in DSP fixed point (half of the faults are dynamic range or sign related, combined with the crazy curved space things are happening in anyway). And then another couple of goes to get rid of the wobbles. Painful!
Anyway it's correct now, except for a half-texel rounding error which causes everything to slide slightly to the right as it recedes into the distance. This one is not hard to fix so I'll leave it for another day.
Now that this (really horrible) bit is working, the result is actually more pixel-precise than the old version on top of being quicker. No glitches. And the CPU isn't involved any more.
The y-projection step for walls is still done on the CPU - it's the second oldest bit of code and this is mainly whats preventing a fully DSP'd analogue of Doom's R_AddLine and (nearly) all the stuff below it. This is next to be fixed and then I'll tidy up and put the optimization tasks on the shelf for a while.
The texturemapping itself isn't actually the hard bit (that was working yesterday). The clipping of those walls correctly in screen space is the hard bit, and involves an awful lot of crap that has to be exactly right or the result is unacceptably broken (or wobbles like playstation polys).
It took me just one try to get the z-clipping part right. It took me *several* tries and some head scratching to get the screenspace clipping to work at all, on top of that, in DSP fixed point (half of the faults are dynamic range or sign related, combined with the crazy curved space things are happening in anyway). And then another couple of goes to get rid of the wobbles. Painful!
Anyway it's correct now, except for a half-texel rounding error which causes everything to slide slightly to the right as it recedes into the distance. This one is not hard to fix so I'll leave it for another day.
Now that this (really horrible) bit is working, the result is actually more pixel-precise than the old version on top of being quicker. No glitches. And the CPU isn't involved any more.
The y-projection step for walls is still done on the CPU - it's the second oldest bit of code and this is mainly whats preventing a fully DSP'd analogue of Doom's R_AddLine and (nearly) all the stuff below it. This is next to be fixed and then I'll tidy up and put the optimization tasks on the shelf for a while.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
Back in action...
Speed is essential but precision makes it worth looking at
Speed is essential but precision makes it worth looking at

You do not have the required permissions to view the files attached to this post.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Atari God
- Posts: 1266
- Joined: Wed Feb 11, 2004 4:34 pm
- Location: Middle Earth (Npton) UK
-
- Atari Super Hero
- Posts: 961
- Joined: Mon Oct 13, 2008 12:50 pm
- Location: west of London, UK
Re: Bad Mood : Falcon030 'Doom'
All really positive and looks awesome. Great work Doug!dml wrote:Back in action...
Speed is essential but precision makes it worth looking at![]()
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
The y-projection and clipping calcs have now been moved to the DSP. This was also tricky to convert because the near clipping plane has a very small limit, allowing the player to get very close to walls, switches, window ledges etc. without clipping them. The DSP only has a 24bit divide which makes high-precision fixed point projection more difficult.
Anyway the important point here is that the CPU is no longer doing any 3D work. The CPU is still doing a lot of logic per wall surface - to figure out valid window/step heights from sector adjacency, offsetting wall textures etc. Nothing very complicated there but plenty of it happening per wall. I'm going to leave this alone for now but it can also be pushed onto the DSP later or otherwise speeded up.
Some of the older optimizations which took advantage of CPU/DSP speed ratios and concurrency tricks are now broken - it won't run in that mode any more. All that stuff will need reviewed again and re-applied. A small inconvenience for a big reduction in CPU code. It does also mean the FPS reference figures have dropped a bit relative to where they should be but this will be recovered later.
The DSP is still doing some things per wall surface that only need done once per map segs - this will be easy enough to move now that segs are buffered on the DSP and indexed for wall insertion.
Will be time for a profile run soon - to see what the damage is and how things have moved around. I expect some good and bad news and some changes will be needed but overall should look positive.
Anyway the important point here is that the CPU is no longer doing any 3D work. The CPU is still doing a lot of logic per wall surface - to figure out valid window/step heights from sector adjacency, offsetting wall textures etc. Nothing very complicated there but plenty of it happening per wall. I'm going to leave this alone for now but it can also be pushed onto the DSP later or otherwise speeded up.
Some of the older optimizations which took advantage of CPU/DSP speed ratios and concurrency tricks are now broken - it won't run in that mode any more. All that stuff will need reviewed again and re-applied. A small inconvenience for a big reduction in CPU code. It does also mean the FPS reference figures have dropped a bit relative to where they should be but this will be recovered later.
The DSP is still doing some things per wall surface that only need done once per map segs - this will be easy enough to move now that segs are buffered on the DSP and indexed for wall insertion.
Will be time for a profile run soon - to see what the damage is and how things have moved around. I expect some good and bad news and some changes will be needed but overall should look positive.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
Will there be:dml wrote:Will be time for a profile run soon - to see what the damage is and how things have moved around.
- any symbols in the program which I could use as indicators of code where frame starts and ends
- code for reading "default" WAD if no filename is given as arg
- not waiting for a keypress at startup, when using default WAD
?
With such a things I could easily automate getting accurate timings and exact profile data separately for both:
- Rendering of a specific (e.g. first) frame, and
- Starting up of Bad Mood.
As Bad Mood startup takes a while (mostly due to mipmap generation), and profiling requires frequent startups with new Bad Mood versions, it would be nice to be able to automate all of that. I could then provide you scripts for doing it automatically with Hatari.

-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
I can easily add those things for the next try.Eero Tamminen wrote: Will there be:
- any symbols in the program which I could use as indicators of code where frame starts and ends
- code for reading "default" WAD if no filename is given as arg
- not waiting for a keypress at startup, when using default WAD
?
The main thing I want to get out of the next profile run is a rough idea of how the bottlenecks have moved. I have almost certainly made some things worse by changing overlaps between CPU/DSP as it was hand optimised in earlier versions and that's all messed up now - the amount of dead time in some processes will have increased - much of it proportional to the amount of CPU code stripped out.
There are 2 areas which I now expect to see blocking significantly:
- Adding a map seg and waiting for the result of the visibility, occlusion tests, before recording it for wall surface insertion later. This is the only high-level point where the CPU waits for a DSP result, breaking unidirectional pipeline. But it has to do this while the surface (upper,mid,lower) insertion logic is still on the CPU. While the wait could be removed by moving all this code onto the DSP, it can be hidden for now by folding one iteration of the CPU segment processing loop over the previous DSP visibility result.
- Flushing/extracting pending visplane zonedata, when BSP order forces the sector floor or ceiling texture, luma or height to change, the DSP has to 'commit' the visplane somewhere so it can start building a new one. The old floor/ceiling drawing code required the visplanes to be pulled out of the DSP during BSP walk and stored in a buffer. The newer code buffers that inside the DSP until drawing so the flush really only needs to emit the buffer start/end position for the CPU to track it during drawing. The CPU needs to wait for the defragmentation process (an important part of the commit) to complete to find the buffer end position but again this can be folded over two events to remove the wait (e.g. get old result, start process for new result, exit).
So there are changes planned to deal with the expected blockages but it would be good to actually see the blockages first

Some of this I'll need to find out using the process I used before - the spreadsheet with rules to find the host blocking points. But having a fresh view of the dominant code and call frequency will also be very helpful.
It should be easier this time round because there are very few host exchange points left - it's just that the remaining ones will have been amplified by some unknown amount.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Hardware Guru
- Posts: 4725
- Joined: Sat Sep 10, 2005 11:11 am
- Location: Kosice, Slovakia
Re: Bad Mood : Falcon030 'Doom'
Just curious -- how did you get rid of them? If I remember correctly, the wobbliness happens when you cut the floating part of the result of the clipping equation for u,v coordinates. Actually, I had no idea it can be fixed in a software renderer ;-)dml wrote:And then another couple of goes to get rid of the wobbles. Painful!
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
Actually there is no single answer to this one because there are so many sources of wobble and glitches in software rendering it's sometimes hard to be sure what's going on - it depends a lot on how its written and there are lots of different methodsmikro wrote:Just curious -- how did you get rid of them? If I remember correctly, the wobbliness happens when you cut the floating part of the result of the clipping equation for u,v coordinates. Actually, I had no idea it can be fixed in a software rendererdml wrote:And then another couple of goes to get rid of the wobbles. Painful!

I have changed methods a few times to reduce different kinds of wobble - in texture u, and screen-y calcs etc. but there are some things which are helpful to apply, depending on the cause.
The wobble in my case gets caused by insufficient fraction bits either in a divide for an affine gradient (the gradients used to walk uz,z in screen pixels), or in the affine step values themselves which then get multiplied to index the clipping. It can also be a result of the perspective divide itself if the ranges involved cause too many bits to get lost (since the formula usually involves something like ((u*zV) / zI) and all terms need their own fraction bits).
On the one hand, the more you can move the precision-sensitive work away from inner loops the better. But sometimes doing this can actually make things worse. e.g. if you have to rely on affine gradients being super precise and the precision just isn't there in the first place (it's one thing to decide on a 48bit real for an affine gradient but if a divide generated only 12 useful fraction bits in the first place, that's not helping much).
The best way to deal with divides specifically is to maximise the number of bits involved - normalize the terms as a floating point unit would do (left shift both terms until the divisor is only just greater than the dividend, with dividend in the upper word). The divide then yields the maximum number of useful bits which you can then shift back into position - you also then have a chance to keep any lost bits in a 'carry' gradient term or offset your fixed point to take them into account.
This isn't quick of course because normalizing takes time, and then you need to denormalize at the end - but the DSP has a norm operation which helps and you don't need to do it very often if it's done only to prepare a gradient for a whole surface or edge and not inside the plotting loop (or sub-span loop).
You can of course write your own iterative divide but that's a bit nuts. Too slow. The method below is probably the best overall....
The next method is similar to the last but quicker - it generates a reciprocal by normalizing only the divisor. You can then use a 24x48bit multiply (7 ops on DSP) to perform something like a 48bit divide and get lots of precision since the denorm step can be done afterwards. This gives you similar results to a floating point reciprocal multiply - you just need to be careful the result remains within range in the final steps and enough headroom should be left to avoid overflow.
something like this:
dvn[24],dve[16] = norm(dv[48])
r[24] = 1 / dvn[24]
x[48] = y[48] * r[24]
x[48] >>= dve
Note that reciprocals still don't give you the same result as a full divide but there are multiple speed advantages (reuse & cacheing, only need to normalize one term, can be used in long muls).
Other kinds of wobble result from not indexing the attribute (e.g. u,v,z) terms properly when clipping. e.g. they are projected into screen space as reals, but need rounded into pixel space as integers. The fractions lost in rounding need kept, and used to index the attributes for correct rendering. When dealing with perspective-correct attributes, screenspace clipping is always in terms of (attribute*z) and not the attribute itself, or you get stretchy wobble effects.
e.g. something like..
Code: Select all
[r16.8] view_i1 = proj(mview, x1,z2)
[r16.8] view_i2 = proj(mview, x2,z2)
[i16] screen_i = clamp((view_i >> 8), 0, 320-1)
[r16.8] clip_i = (((screen_i << 8) - view_i) + rounding) / (view_i2 - view_i);
However the solution to wobbles in each case depends on the exact cause, the dynamic range of the numbers involved and the step value sizes in any gradients being used (so you may find that some of this stuff above is helpful, or unnecessary depending!).
That was a bit of a wild ramble but there might be something useful in it.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Ultimate Atarian
- Posts: 5790
- Joined: Mon Aug 16, 2004 12:06 pm
- Location: Prestonsburg, KY - USA
Re: Bad Mood : Falcon030 'Doom'
Doug, we'll take your "wild ramble" over any organised programming seminars ordml wrote:
That was a bit of a wild ramble but there might be something useful in it.
conferences, any day.

Welcome To DarkForce! http://www.darkforce.org "The Fuji Lives.!"
Atari SW/HW based BBS - Telnet:darkforce-bbs.dyndns.org 1040
Atari SW/HW based BBS - Telnet:darkforce-bbs.dyndns.org 1040
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
Well put.DarkLord wrote:Doug, we'll take your "wild ramble" over any organised programming seminars or conferences, any day.
And what Doug mentioned is of course as applicable to 2D gfx/physics (e.g. when calculating multibody collisions), not just 3D although with 3D the problems can happen easier due to significantly larger number of calculations...
-
- Hardware Guru
- Posts: 4725
- Joined: Sat Sep 10, 2005 11:11 am
- Location: Kosice, Slovakia
Re: Bad Mood : Falcon030 'Doom'
Great food for thought, Doug! Thanks.
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
Well after all that it turns out I do still have some wobbles to fix in some cases for very long walls. You see the best way to find bugs is always to pretend you don't have anymikro wrote:Great food for thought, Doug! Thanks.

BTW I am trying to do something tricky (which hasn't worked for me so far in Hatari) - I want to try it first on a real machine but I may PM you with some details later in case you have any advice on making it work. At least you might have a good laugh at the attempt

d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
Actually I still have scars from the last time I was involved in RBD physics and collision detection / response. Wow that was painful. It's one of the hardest things to implement, can produce the most amazing variety of bugs and makes a complete mockery of floating point capabilities (I ended up making some very powerful NaN traceback toolsEero Tamminen wrote: And what Doug mentioned is of course as applicable to 2D gfx/physics (e.g. when calculating multibody collisions), not just 3D although with 3D the problems can happen easier due to significantly larger number of calculations...

The 2nd hardest thing to debug was probably building numerically stable multiresolution BSP trees - which also made a mockery of floating point but was at least easier to visualize the faults.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
I have an additional idea that will further help profiling automation (in addition to render start & end symbols you already added):
- Having symbol that at the start of a frame contains how many VBLs previous frame took. With this, I can automatically provide profile data for a frame which goes over some externally defined VBL limit, while one uses BM. It can give data either for first or last such frame.
- Or there could be a symbol that contains largest number of VBLs taken by any frame so far, and which is initialized to a value which program (normally) should not exceed. I can then save profile data whenever that value changes (i.e. increases), and as result you have profiling data for a frame that took most VBLs.
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
In fact this is probably most useful for scrolling or other frame-locked games which are sensitive to frame overruns. It's harder to imagine a real use case for something like BM because the VBL is not really an important kind of event and there isn't really any spiking behaviour to track (The engine doesn't even wait for a VBL, it triple-buffers to avoid the need for that).
If there are exotic bugs which cause a random frame to pause for a lengthy period it would be really good for tracking those down. I'm not aware of anything like that - if it exists, it's probably specific to frame 0 init only - but it's probably valuable to have if something like that was to surface.
It will however become *much* more useful when BM can replay demo loops with existing previous captures for the same demo. I imagine that's what you have in mind, given the auto-profiling scripting approach you've been working on?
If there are exotic bugs which cause a random frame to pause for a lengthy period it would be really good for tracking those down. I'm not aware of anything like that - if it exists, it's probably specific to frame 0 init only - but it's probably valuable to have if something like that was to surface.
It will however become *much* more useful when BM can replay demo loops with existing previous captures for the same demo. I imagine that's what you have in mind, given the auto-profiling scripting approach you've been working on?
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
One can use it also when manually playing the game, but automated demo runs are of course nicer with automated testing.dml wrote:It will however become *much* more useful when BM can replay demo loops with existing previous captures for the same demo. I imagine that's what you have in mind, given the auto-profiling scripting approach you've been working on?

Besides highest/max VBL variable, there might also be smallest/min VBL variable to find out whether there are frames that did too little work. Often frames with too little work and too much work are adjacent, so this kind of information might help in balancing the work between frames.
In game like Doom/BadMood where the frame content can vary wildly, this kind of testing should be done over large variety of content (different WADs & demo runs), to first find where the worst frame is. Then one can produce a smaller test (e.g. by using Hatari memory snapshot save/load) which has just stuff around the frame with the worst FPS, so that one gets meaningful "best" FPS frame info close to it.
To know how close the "best" frame is, profile data should probably have some kind of "timestamp" one can compare. Would number of VBLs since boot be good for that? (Internal game frame count would of course be best, but I don't see any generic way to get that information to the profile data)
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
It will probably be more helpful with the game code present, since causes for spiking costs are far more likely - things waking up and performing actions, collisions getting into a weird state etc. This is also possible with current BM but not very likely - the main risk (collision code) will have to be overtaken by Doom code anyway for correct behaviour.Eero Tamminen wrote: In game like Doom/BadMood where the frame content can vary wildly, this kind of testing should be done over large variety of content (different WADs & demo runs), to first find where the worst frame is. Then one can produce a smaller test (e.g. by using Hatari memory snapshot save/load) which has just stuff around the frame with the worst FPS, so that one gets meaningful "best" FPS frame info close to it.
Would this not be better done by posting a message from the code? (this is how we did things for games profiling - some signals from the code help group & reference measurements) Or are you interested in making it work for projects for which there is no access to code?Eero Tamminen wrote: To know how close the "best" frame is, profile data should probably have some kind of "timestamp" one can compare. Would number of VBLs since boot be good for that? (Internal game frame count would of course be best, but I don't see any generic way to get that information to the profile data)
[EDIT]
I'm wary of using VBL for anything since it's not used for synchronization at all. I'm also pretty sure TOS calls are avoided generally inside the mainloop - at least nothing that you could intercept. There is likely some behaviour in each application you could intercept if you cast your net wide enough but a one-solution-for-all isn't likely, without the app posting a friendly message to your profiler for that purpose.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
Note that I didn't mean counting Vsync() OS calls, but VBLs. That would be easy because Hatari already has counter for that [1]. VBL count since boot would be used just for being able to approximately know how close the profile data files are to each other. And as you guessed, good point is it not requiring any changes to profiled code i.e. it's a generic solution... This counter relates to generic profiler output, not to the automation done with breakpoint scripting & program specific symbols.dml wrote:Would this not be better done by posting a message from the code? (this is how we did things for games profiling - some signals from the code help group & reference measurements) Or are you interested in making it work for projects for which there is no access to code?Eero Tamminen wrote: To know how close the "best" frame is, profile data should probably have some kind of "timestamp" one can compare. Would number of VBLs since boot be good for that? (Internal game frame count would of course be best, but I don't see any generic way to get that information to the profile data)
As for tracing & manually instrumenting the code (with which such messages are typically used), you could just enable suitable Hatari tracing options and use Hatari's Native Features API [2] for printing text from emulated program to the Hatari console at appropriate points.
It may be better if you think of Hatari profiler as uncommonly accurate sampling profiler, because while it traces every instructions, it's output is similar to what sampling profilers give you.

[1] You could use VBL counter with breakpoints already in Hatari v1.3:
Code: Select all
b VBL = "VBL+20"
BM doesn't synchronize screen output to avoid tearing?dml wrote:I'm wary of using VBL for anything since it's not used for synchronization at all.
If that that is because it would slow the output too much, are you going to reconsider it after moving "enough" of the CPU load to DSP side?
-
- Fuji Shaped Bastard
- Posts: 3991
- Joined: Sat Jun 30, 2012 9:33 am
Re: Bad Mood : Falcon030 'Doom'
Eero Tamminen wrote: Would this not be better done by posting a message from the code? (this is how we did things for games profiling - some signals from the code help group & reference measurements) Or are you interested in making it work for projects for which there is no access to code?
Unless the app is strictly synchronizing (or async, but averaging down over many VBLs) it is too coarse time measurement and I think the 'frame rounding' would yield a lot of false positives.Eero Tamminen wrote:Note that I didn't mean counting Vsync() OS calls, but VBLs. That would be easy because Hatari already has counter for that [1].
If it doesn't have to be precise then it probably is good enough. If it needs to be reasonably precise within a long run there would need to be some kind of synchronization (with 'game ticks' or strict vsync on the app side - game ticks is always better, it never goes wrong because it's not a real-time counter).Eero Tamminen wrote:VBL count since boot would be used just for being able to approximately know how close the profile data files are to each other.
That's quite useful on its own. I'll be taking a closer look at some of that stuff soon.Eero Tamminen wrote:As for tracing & manually instrumenting the code (with which such messages are typically used), you could just enable suitable Hatari tracing options and use Hatari's Native Features API [2] for printing text from emulated program to the Hatari console at appropriate points.
Yes it is in fact better because it is completely non-intrusive, in it's own time universe. Most profilers interfere with profile gathering by their own overheads which are sometimes very hard to hide.Eero Tamminen wrote:It may be better if you think of Hatari profiler as uncommonly accurate sampling profiler, because while it traces every instructions, it's output is similar to what sampling profilers give you.
It doesn't need to synchronize to avoid tearing. It uses two backbuffers and logically it can't be writing one while also displaying it due to timing. And yes, it does this to claim back 'dead time' that would be spent synchronizing. With the framerate unfixed and vsync active, the amount of dead time would vary and interfere noticably with smooth camera motion esp as the framerate rises nearer 12fps.Eero Tamminen wrote: BM doesn't synchronize screen output to avoid tearing?
If that that is because it would slow the output too much, are you going to reconsider it after moving "enough" of the CPU load to DSP side?
And yes - I may have to remove it for practical reasons (fall back to 2 buffers and optional vsync vs tearing) on 4MB systems if the total memory load for BM + Doom is too great. On 14MB systems it will always triple-buffer though.
d:m:l
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
Home: http://www.leonik.net/dml/sec_atari.py
AGT project https://bitbucket.org/d_m_l/agtools
BadMooD: https://bitbucket.org/d_m_l/badmood
Quake II p/l: http://www.youtube.com/playlist?list=PL ... 5nMm10m0UM
-
- Fuji Shaped Bastard
- Posts: 3999
- Joined: Sun Jul 31, 2011 1:11 pm
Re: Bad Mood : Falcon030 'Doom'
Hatari profiler already does precise measurement, by summing together all cycles spent between between profile start and end. If you have breakpoint(s) on frame start/end, you get precise measurements on how long frame took.dml wrote:If it doesn't have to be precise then it probably is good enough. If it needs to be reasonably precise within a long run there would need to be some kind of synchronization (with 'game ticks' or strict vsync on the app side - game ticks is always better, it never goes wrong because it's not a real-time counter).Eero Tamminen wrote:VBL count since boot would be used just for being able to approximately know how close the profile data files are to each other.
As to whether program worst/best/min/max speed symbols count VBLs, game ticks or whatever, doesn't matter much. The point was about finding whether there's some correlation between them and could something be done to improve things by splitting costs...
Comparing VBL counter values (VBLs since boot) tells how many VBLs occurred between & during profiles. If none happened between them, they were for adjecent frames (one started where another ended). Alternatively profile could just list how much time e.g. in seconds had passed from boot (at profile start & end), but I think VBL counter is good general measure for this for most games & demos, especially if they're doing vsync.

Take a look at the tracing breakpoints too. They can show whenever values in your program (variables in memory, IO and normal register values) change without interrupting it. I've used it to check for line-A calls in programs, track variable corruption etc. Syntax is not may be the friendliest, but if you have some specific case, I can give examples (Hatari manual Debugger section has also some examples).dml wrote:That's quite useful on its own. I'll be taking a closer look at some of that stuff soon.Eero Tamminen wrote:As for tracing & manually instrumenting the code (with which such messages are typically used), you could just enable suitable Hatari tracing options and use Hatari's Native Features API [2] for printing text from emulated program to the Hatari console at appropriate points.
Or if you meant NatFeats debugger output, easier may be to use BIOS output functions and Hatari "--conout <bios dev>" option for outputting them to console. Natfeats should be a bit faster though as for that OS doesn't need to do anything. EmuTOS has code on using NatFeats API here:
http://emutos.cvs.sourceforge.net/viewv ... utos/bios/