Eero Tamminen wrote:
Is that both with "--desktop" set to "yes" and "no"?
(With "yes" desktop resolution isn't changed, Falcon display is zoomed to integer multiple of Falcon resolution closest to desktop resolution.)
I tried it recently with "yes" for the desktop arg and got the same result - which is to say the fullscreen mode won't enter fullscreen 'properly' but instead blits the image onto the desktop with 2x zoom, with a black border around it. The OS menu bar is still visible etc. In this mode its much slower than windowed mode with the usual 2x zoom. However it could be a result of the way I built it, haven't tried an official release to see if it's the same.
Eero Tamminen wrote:
Sounds reasonable. Have had yet time to look into the expensive audio part?
I looked at it briefly but remembered that the left/right panning needs done and tested before optimizing the mixer - in case I had to change it or drop it for some reason. This is why the 'slow' mixer is enabled just now (there are other semi-optimized versions but they are turned off).
I've been working on some other crazy thing for a few days, and repairing some mainboards before that. Will return to it next week.
Eero Tamminen wrote:
Executed instructions:
7.98% 7.98% 7.98% 10626773 10632373 10632373 _subframe_block
5.83% 5.84% 6.29% 7764088 7775516 8382794 _R_PointInSubsector
5.43% 5.43% 5.75% 7228956 7239562 7663211 D_AllocPreview_Dynamic
I see the memory debugger is left on again :-/ I'll need to turn that off. Was enabled for tracking mipmap allocation.
Eero Tamminen wrote:
DSP side:
...
Most of time in R_DoColumnPerspCorrect goes to:
Code: Select all
p:045e 0aa981 00045e (06 cyc) jclr #1,x:$ffe9,p:$045e 9.86%
That's fine - it means the DSP is mostly ahead of things, waiting for the CPU to finish a column and fetch a new one.
This is actually one of the things I was interested to track in a better way than I have been so far - it's quite effort intensive right now.
I'm wondering if it's possible for the profiler to analyze spinloops - any pair of instructions which result in a tight loop, on either CPU or DSP.
Originally I was interested in detecting just DSP host port spinloops because they are relatively easy to decode on the DSP - they always look the same. On the CPU side however, they can vary a bit and it's harder.
I then realized that any 2-opcode loop is a spinloop, and the semantics are going to be similar so it's probably easier/better to just detect all of them and profile them in the same way (blitter spinloops could also benefit from this). So on the CPU side any pair of ops where the 2nd is a branch back to the first - doesn't matter what the first opcode happens to be.
The general idea is to track the activity of spinloops one level higher than the instruction counts. Specifically, recording the minimum, maximum and average iteration count for each spinloop recognized. A digest of the spinloop sites with these metrics is immensely useful because it becomes possible to spot a stall which occurs only infrequently but for a significant duration. It also helps pinpoint those spinloops which never iterate due to a favourable performance ratio, and can probably be removed.
I think the most valuable side to watch is the CPU side, because the CPU should not be waiting for the DSP except in some rare cases for vector operations where inputs and outputs follow each other closely. Watching the DSP side is also useful though - it can give some indication of where there are unused/idle cycles which could absorb nearby work if some code is moved. etc.
Anyway see what you think

The way I did this before involved a very large and unresponsive spreadsheet with the profiler disasm pasted into it and some simple column calcs to spot the stalls. It only handled the DSP side but it 'inferred' the CPU side by counting any DSP spinloops with low iteration counts (relative to neighbour ops), as a DSP bottleneck (i.e. CPU side is spinning). It's better though to actually track the CPU side since buffering between the two sides makes 'inferring' a bit less reliable.