mikro wrote: ↑Tue Sep 26, 2023 8:08 am
Eero Tamminen wrote: ↑Mon Sep 25, 2023 10:42 pm=> Surprisingly high cost from adding one boolean check to channels loop. Meaning that while this change helps silent parts, it could degrade perf when something actually needs to be mixed.
Are you sure about this? Is it really so easy to compare mixCallback() with memset always present vs. mixCallback() without memset?
One can always look into profiler disassembly output, e.g. for the game list scrolling.
GCC13 ScummVM from 9th of September:
Code: Select all
Audio::MixerImpl::mixCallback(unsigned char*, unsigned int):
$022cc19e subq.l #$8,sp 0.00% (18959, 18953, 0, 0)
$022cc1a0 movem.l d2-d6/a2-a6,-(sp) 0.00% (18959, 606677, 7, 0)
$022cc1a4 movea.l $34(sp),a3 0.00% (18959, 56877, 1, 18958)
$022cc1a8 move.l $38(sp),d5 0.00% (18959, 189572, 18961, 18959)
$022cc1ac move.l $3c(sp),d3 0.00% (18959, 75833, 7, 18958)
$022cc1b0 clr.l -(sp) 0.00% (18959, 94791, 5, 0)
$022cc1b2 pea.l $4(a3) 0.00% (18959, 75859, 3, 0)
$022cc1b6 moveq #$30,d6 0.00% (18959, 0, 0, 0)
$022cc1b8 add.l sp,d6 0.00% (18959, 170638, 18963, 0)
$022cc1ba move.l d6,-(sp) 0.00% (18959, 75836, 2, 0)
$022cc1bc jsr StackLock() 0.00% (18959, 246471, 18969, 0)
$022cc1c2 move.b #$1,$12(a3) 0.00% (18959, 151667, 8, 0)
$022cc1c8 move.l d3,-(sp) 0.00% (18959, 227508, 18961, 0)
$022cc1ca clr.l -(sp) 0.00% (18959, 56877, 0, 0)
$022cc1cc move.l d5,-(sp) 0.00% (18959, 56877, 2, 0)
$022cc1ce jsr _memset 0.00% (18959, 246476, 18967, 0)
$022cc1d4 adda.w #$18,sp 0.00% (18960, 75345, 6, 0)
$022cc1d8 tst.b $c(a3) 0.00% (18960, 341249, 18968, 0)
$022cc1dc beq.w $22cc27c <not taken> 0.00% (18960, 56892, 12, 0)
$022cc1e0 lsr.l #$2,d3 0.00% (18960, 37932, 12, 0)
$022cc1e2 lea.l $30(a3),a2 0.00% (18960, 75840, 18, 0)
$022cc1e6 move.l a3,d2 0.00% (18960, 37903, 0, 0)
$022cc1e8 addi.l #$b0,d2 0.00% (18960, 227453, 18983, 0)
$022cc1ee clr.l d4 0.00% (18960, 37899, 0, 0)
$022cc1f0 lea.l $22cc114(pc),a6 0.00% (18960, 75832, 23, 0)
$022cc1f4 lea.l $22cb2d4(pc),a5 0.00% (18960, 75832, 18, 0)
$022cc1f8 lea.l $2373a7e.l,a4 0.00% (18960, 227410, 18986, 0)
$022cc1fe movea.l (a2)+,a3 0.07% (606720, 2407559, 42, 436056)
$022cc200 tst.l a3 0.07% (606720, 1043771, 806, 0)
$022cc202 beq.b $22cc242 0.07% (606720, 1356475, 23447, 0)
[...]
$022cc242 cmp.l a2,d2 0.07% (606720, 1193018, 22, 0)
$022cc244 bne.b $22cc1fe 0.07% (606720, 1218395, 2240, 0)
$022cc246 move.l d6,-(sp) 0.00% (18960, 94710, 0, 0)
$022cc248 jsr ~StackLock() 0.00% (18960, 246408, 18954, 0)
$022cc24e addq.l #$4,sp 0.00% (18960, 18960, 0, 0)
$022cc250 move.l d4,d0 0.00% (18960, 37931, 11, 0)
$022cc252 movem.l (sp)+,d2-d6/a2-a6 0.00% (18960, 383489, 20, 150272)
$022cc256 addq.l #$8,sp 0.00% (18960, 18960, 0, 0)
$022cc258 rts 0.00% (18960, 499088, 37926, 1398)
GCC13 from 25th of September:
Code: Select all
Audio::MixerImpl::mixCallback(unsigned char*, unsigned int):
$022d33ee subq.l #$8,sp 0.02% (145072, 145054, 0, 0)
$022d33f0 movem.l d2-d7/a2-a6,-(sp) 0.02% (145072, 5077505, 64, 13)
$022d33f4 movea.l $38(sp),a3 0.02% (145072, 435747, 18, 145000)
$022d33f8 move.l $3c(sp),d6 0.02% (145072, 1450577, 145102, 145067)
$022d33fc move.l $40(sp),d4 0.02% (145072, 580397, 56, 145048)
$022d3400 clr.l -(sp) 0.02% (145072, 725366, 53, 0)
$022d3402 pea.l $4(a3) 0.02% (145072, 580560, 61, 1)
$022d3406 pea.l $34(sp) 0.02% (145072, 580304, 20, 1)
$022d340a jsr StackLock() 0.02% (145072, 3191622, 290228, 19)
$022d3410 move.b #$1,$12(a3) 0.02% (145072, 1160681, 108, 4)
$022d3416 lea.l $30(a3),a5 0.02% (145072, 173269, 3533, 0)
$022d341a move.l a3,d2 0.02% (145072, 286613, 4, 1)
$022d341c addi.l #$b0,d2 0.02% (145072, 870449, 36, 3)
$022d3422 adda.w #$c,sp 0.02% (145072, 580288, 25, 3)
$022d3426 clr.b d3 0.02% (145072, 290123, 8, 4)
$022d3428 clr.l d5 0.02% (145072, 308666, 2724, 0)
$022d342a lea.l $22d3364(pc),a2 0.02% (145072, 577579, 15, 1)
$022d342e lea.l _memset,a4 0.02% (145072, 870426, 25, 5)
$022d3434 movea.l (a5)+,a6 0.50% (4642304, 16439633, 1995, 3620598)
$022d3436 tst.l a6 0.50% (4642304, 8261466, 120, 25)
$022d3438 beq.b $22d3482 0.50% (4642304, 10316942, 152646, 36)
[...]
$022d3482 cmp.l a5,d2 0.50% (4642304, 9136750, 184, 37)
$022d3484 bne.b $22d3434 0.50% (4642304, 9291175, 3615, 24)
$022d3486 pea.l $2c(sp) 0.02% (145072, 1015359, 24, 8)
$022d348a jsr ~StackLock() 0.02% (145072, 1887882, 145322, 7)
$022d3490 addq.l #$4,sp 0.02% (145072, 145143, 39, 0)
$022d3492 move.l d5,d0 0.02% (145072, 290091, 0, 0)
$022d3494 movem.l (sp)+,d2-d7/a2-a6 0.02% (145072, 3776591, 83, 1014068)
$022d3498 addq.l #$8,sp 0.02% (145072, 145060, 4, 1)
$022d349a rts 0.02% (145072, 1305879, 145228, 145049)
First obvious thing is that although the use case is same, in latter case mixCallback() is called 145072 vs 18960 times =
7.7x more often.
Although the most noticeable rendering functions in the profile are still called about the same number of times:
- drawString(): 7748 vs. 8096 = 0.96x
- drawChar(): 359942 vs. 376787 = 0.96x
- c2p1x1_8_rect_start(): 5626936 vs. 5760568 = 0.98x
- c2p1x1_8_rect_pix16(): 5626496 vs.5760128 = 0.98x
Whereas in the Putt-Putt demo intro case, mixer callback is still called about same number of times in both cases (as are e.g. byleRLEDecode() and drawStripToScreen() functions from its profile).
Which explains why mixCallback() overhead increased radically in silent scrolling case although memset() calls went away, and why I did not see that same overhead in Putt-Putt case.
mikro wrote: ↑Tue Sep 26, 2023 8:08 am
Or to put it another way, how big performance decrease we are talking about? By definition, there has to be a few cycles per loop added as I did add the check but certainly it shouldn't be something to worry about.
EDIT: Sorry, overlooked your previous message. Ok, makes sense then. So the conclusion is that the memset() skipped did help for silence scenes, right?
Yes. but now the big question is
how mixer callback can then end up being called nearly 8x more often in silence case after the optimization?
Optimizing audio handling does not help that much if it just gets called much more often compared to screen updates...
