This case is from the very beginning of the 4th Doom1 level (TIMEBASE_CONTROL=3):
Code: Select all
Time spent in profile = 0.37274s.
Visits/calls:
- max = 794, in _P_PointOnDivlineSide at 0x30388, on line 3260
- 8745 in total
Executed instructions:
- max = 9600, in _subframe_block+456 at 0x1ef2c, on line 766
- 649621 in total
Used cycles:
- max = 192068, in _subframe_block+470 at 0x1ef3a, on line 771
- 5979720 in total
Instruction cache misses:
- max = 1110, in _P_PointOnDivlineSide+74 at 0x303d2, on line 3271
- 92849 in total
...
Executed instructions:
24.59% 159739 R_DrawSurface_NW
15.03% 15.04% 15.32% 97651 97711 99496 _subframe_block
8.62% 55986 MARKER
5.17% 33576 R_VisPlaneShader
4.32% 4.33% 20.96% 28050 28133 136167 _P_RunThinkers
3.62% 3.64% 5.78% 23529 23639 37535 _P_PointOnDivlineSide
2.29% 2.29% 2.29% 14888 14906 14906 _R_DrawColumn
2.05% 2.05% 8.00% 13308 13343 51953 _P_BlockLinesIterator
1.99% 2.00% 3.03% 12944 12986 19698 stream_texture
1.80% 1.80% 5.51% 11688 11709 35789 _PIT_AddLineIntercepts
1.50% 1.50% 2.68% 9718 9753 17396 _V_DrawPatch
1.26% 1.27% 1.27% 8176 8257 8257 stack_visplane_area
1.21% 1.21% 1.23% 7844 7844 7991 R_AddSpriteSpans
1.14% 1.14% 1.14% 7416 7416 7416 _R_PointInSubsector
0.98% 6392 R_VisPlaneSkyShader
0.94% 6114 R_StackTransparentSurface
0.91% 5913 R_AddLine_loop
0.88% 0.88% 12.83% 5691 5698 83350 _P_PathTraverse
0.82% 0.84% 1.43% 5327 5459 9275 _BM_P_CrossBSPNode
0.81% 0.81% 0.81% 5276 5283 5283 _P_UpdateSpecials
0.78% 5091 R_SwitchSurface_T2
0.75% 0.75% 4.48% 4894 4901 29086 _P_BlockThingsIterator
0.73% 0.74% 0.74% 4760 4787 4787 _P_InterceptVector
0.71% 0.71% 3.01% 4642 4642 19548 _R_DrawMaskedColumn
...
Instruction cache misses:
9.68% 9.75% 10.04% 8990 9050 9320 _P_PointOnDivlineSide
7.93% 7.94% 28.88% 7360 7375 26816 _P_BlockLinesIterator
7.87% 7.92% 47.84% 7311 7352 44416 _P_RunThinkers
7.73% 7.74% 18.85% 7178 7187 17501 _PIT_AddLineIntercepts
3.62% 3.62% 3.70% 3357 3357 3437 R_AddSpriteSpans
2.93% 2.95% 2.95% 2723 2737 2737 _P_InterceptVector
2.91% 2.91% 35.73% 2701 2704 33174 _P_PathTraverse
2.36% 2194 R_DrawSurface_NW
2.12% 2.12% 4.64% 1972 1972 4309 _PIT_AddThingIntercepts
2.08% 2.08% 34.58% 1931 1934 32107 _P_SetMobjState
1.98% 1838 MARKER
1.84% 1.84% 2.09% 1709 1712 1940 _PIT_CheckLine
1.72% 1.74% 1.86% 1600 1616 1727 R_ViewTestSpriteLines
1.65% 1.65% 6.84% 1533 1536 6353 _P_BlockThingsIterator
1.48% 1.48% 2.43% 1378 1378 2252 _R_DrawMaskedColumn
1.38% 1.39% 6.89% 1281 1292 6397 _P_CheckPosition
1.34% 1245 build_ssector
1.07% 1.08% 6.97% 995 1001 6475 R_FlushDeferredSurfaces
1.03% 1.03% 3.48% 958 958 3234 _R_DrawVisSprite
subframe_block code using most cycles:
Code: Select all
...
$01ee06 : bra $1ee70 0.00% (16, 128, 0)
[...]
$01ee70 : move.l d0,d4 0.74% (4800, 19200, 32)
$01ee72 : swap d4 0.74% (4800, 19264, 16)
$01ee74 : move.b (a0,d4.w),d6 0.74% (4800, 57728, 16)
$01ee78 : add.l d1,d0 0.74% (4800, 68, 0)
$01ee7a : move.l (a1,d6.w*4),(a4)+ 0.74% (4800, 115200, 0)
$01ee7e : dbra d7,$1ee70 0.74% (4800, 19264, 0)
$01ee82 : bra $1ee86 0.00% (16, 128, 0)
$01ee86 : movea.w #1,a6 0.00% (16, 64, 0)
$01ee8a : move.w $2e(sp),d0 0.00% (16, 128, 0)
$01ee8e : cmp.w a6,d0 0.00% (16, 64, 16)
$01ee90 : ble $1ef50 0.00% (16, 128, 16)
$01ee94 : move.w d0,$2c(sp) 0.00% (16, 192, 16)
$01ee98 : movea.w a6,a0 0.00% (32, 0, 0)
$01ee9a : lea $b6340,a1 0.00% (32, 320, 16)
$01eea0 : movea.l (a1,a0.l*4),a5 0.00% (32, 576, 16)
$01eea4 : move.l $30(sp),d7 0.00% (32, 448, 16)
$01eea8 : subq.w #1,d7 0.00% (32, 64, 0)
$01eeaa : movea.l $34(sp),a4 0.00% (32, 384, 0)
$01eeae : move.l 4(a5),d0 0.00% (32, 384, 0)
$01eeb2 : move.l 8(a5),d1 0.00% (32, 384, 0)
$01eeb6 : movea.l (a5),a0 0.00% (32, 448, 16)
$01eeb8 : movea.l $14(a5),a1 0.00% (32, 448, 16)
$01eebc : clr.l d6 0.00% (32, 0, 0)
$01eebe : bra $1ef2c 0.00% (32, 256, 0)
[...]
$01ef2c : move.l d0,d4 1.48% (9600, 38336, 36)
$01ef2e : swap d4 1.48% (9600, 38468, 17)
$01ef30 : move.b (a0,d4.w),d6 1.48% (9600, 115324, 16)
$01ef34 : add.l d1,d0 1.48% (9600, 68, 2)
$01ef36 : move.l (a1,d6.w*4),d4 1.48% (9600, 153656, 0)
$01ef3a : add.l d4,(a4)+ 1.48% (9600, 192068, 18)
$01ef3c : dbra d7,$1ef2c 1.48% (9600, 38660, 16)
$01ef40 : bra $1ef44 0.00% (32, 320, 16)
...
Code doing most cache misses:
Code: Select all
_P_PointOnDivlineSide:
$030388 : movem.l d2-d4,-(sp) 0.12% (794, 27116, 397)
$03038c : move.l $10(sp),d0 0.12% (794, 11120, 398)
$030390 : move.l $14(sp),d1 0.12% (794, 11120, 398)
$030394 : movea.l $18(sp),a0 0.12% (794, 11120, 398)
$030398 : move.l 8(a0),d4 0.12% (794, 12456, 718)
$03039c : bne.s $303b6 0.12% (794, 4776, 2)
[...]
$0303b6 : move.l $c(a0),d2 0.12% (794, 9528, 0)
$0303ba : bne.s $303d2 0.12% (794, 9280, 763)
[...]
$0303d2 : move.l d0,d3 0.12% (794, 1604, 1110)
$0303d4 : sub.l (a0),d3 0.12% (794, 9528, 0)
$0303d6 : sub.l 4(a0),d1 0.12% (794, 9528, 0)
$0303da : move.l d2,d0 0.12% (794, 2784, 696)
$0303dc : eor.l d4,d0 0.12% (794, 392, 0)
$0303de : eor.l d3,d0 0.12% (794, 3176, 696)
$0303e0 : eor.l d1,d0 0.12% (794, 3176, 0)
$0303e2 : blt.s $30402 0.12% (794, 6992, 696)
$0303e4 : asr.l #8,d2 0.08% (551, 2208, 2)
$0303e6 : asr.l #8,d3 0.08% (551, 4156, 488)
$0303e8 : muls.l d3,d2,d3 0.08% (551, 27640, 298)
$0303ec : move.w d3,d2 0.08% (551, 2204, 0)
$0303ee : swap d2 0.08% (551, 3396, 298)
$0303f0 : asr.l #8,d1 0.08% (551, 2204, 0)
$0303f2 : asr.l #8,d4 0.08% (551, 3396, 298)
$0303f4 : muls.l d4,d1,d4 0.08% (551, 27640, 298)
$0303f8 : move.w d4,d1 0.08% (551, 2204, 0)
$0303fa : swap d1 0.08% (551, 2656, 113)
$0303fc : cmp.l d1,d2 0.08% (551, 2204, 0)
$0303fe : sle d0 0.08% (551, 2656, 113)
$030400 : bra.s $30406 0.08% (551, 4408, 0)
$030402 : eor.l d3,d2 0.04% (243, 972, 138)
$030404 : slt d0 0.04% (243, 972, 0)
$030406 : extb.l d0 0.12% (794, 3812, 272)
$030408 : neg.l d0 0.12% (794, 3176, 0)
$03040a : movem.l (sp)+,d2-d4 0.12% (794, 31816, 0)
$03040e : rts
Attached is a callgraph of how the cache misses pile up from it.
You do not have the required permissions to view the files attached to this post.