Regarding Doom II PWADs...
mtfactor
I think Doom2 mtfactor.wad PWAD is also better with flat shading and now I could pick up all the pickups. But there's very minor drawing issue with it, the water in the beginning of the level has a glitch which flickers when one moves around it:
grab0003.png
polygon
When trying to play third round of polygon.wad, BM gave allocation error on level reload, so maybe there's some leak in regards to PWADs:
Code: Select all
Error: Z_Malloc: failed on allocation of 188184 bytes
After that it happened immediately on next BM start with that PWAD, if I let BM run the timedemo at startup. So it might also be just that there are too many objects...?
Pretty full polygon.wad profile looks following:
Code: Select all
Time spent in profile = 535.71525s.
...
Executed instructions:
30.13% 304182591 R_AdvanceSurface_flatshade
12.94% 12.96% 30.88% 130626931 130818507 311747445 _P_RunThinkers
10.60% 106973251 R_VisPlaneFlatShader
5.61% 56624325 R_SpriteColumnShader_Masked2
4.87% 4.91% 5.09% 49136991 49565929 51348878 _BM_P_CrossBSPNode
3.09% 3.10% 8.86% 31215040 31271916 89503964 _P_LookForPlayers
2.56% 25871647 R_BSPHyperPlane
2.01% 2.01% 2.06% 20295815 20310902 20774471 _R_PointInSubsector
1.84% 1.84% 1.84% 18569278 18580653 18580653 _BM_A_Mux1x2
1.62% 1.62% 6.80% 16337468 16403323 68654426 _BM_P_CheckSight
1.41% 1.42% 1.48% 14272481 14309829 14902035 _P_UpdateSpecials
1.33% 1.36% 1.41% 13467053 13686390 14206530 stack_visplane_area
1.31% 13210554 R_AddLine_loop
1.28% 1.28% 3.21% 12915461 12947492 32419218 _P_CheckPosition
1.05% 10648039 R_DrawSurface_flatshade
1.02% 10285150 R_StackTransparentSurface
0.99% 0.99% 15.52% 9954495 9979682 156681507 _P_SetMobjState
0.94% 0.94% 0.98% 9506542 9528448 9877880 R_ViewTestSpriteLines
0.92% 0.92% 0.94% 9297293 9309646 9516556 D_FlatMipGen_8_16
0.66% 0.66% 0.84% 6674584 6684270 8520398 get_flat_floor
0.66% 6622005 build_ssector
0.63% 0.63% 0.68% 6393684 6408171 6837689 R_AddSpriteSpans
0.55% 0.55% 1.79% 5581860 5593735 18054244 get_ssector
0.53% 0.53% 0.58% 5324029 5333660 5867747 _R_ClearPlanes
...
Instruction cache misses:
15.32% 15.41% 62.80% 19039955 19151802 78030973 _P_RunThinkers
9.13% 9.16% 13.12% 11347744 11376999 16303264 _BM_P_CheckSight
8.93% 8.95% 20.93% 11094556 11123719 26005355 _P_LookForPlayers
5.27% 5.28% 41.54% 6549856 6562391 51621294 _P_SetMobjState
3.67% 3.68% 7.38% 4564389 4578661 9171269 _P_CheckPosition
3.66% 3.67% 27.19% 4546738 4554652 33787288 _A_Look
DSP side, used cycles:
46.68% 8023290492 command_base
18.70% 20.96% 20.96% 321419641436027349203602734920 R_DoColumnPerspCorrect
10.02% 1722076216 R_VPRenderFlat
3.63% 624473260 ALGO_P_CrossBSPNode
2.99% 2.99% 2.99% 513827442 513827442 513827442 extract_subvisplane
2.73% 470084810 R_DoColumnTextureUV
1.90% 326865036 R_ViewTestAddLine
1.77% 2.15% 4.41% 304555474 368780408 757771202 AddLowerWall
1.65% 1.90% 19.28% 284374444 3266672303314239434 AddMidWall
1.43% 245043468 P_CrossSubsector_body
1.38% 237275546 project_node
0.91% 156767514 R_CheckBBoxPair
0.75% 0.89% 3.09% 128825108 153581568 530941548 AddUpperWall
0.51% 0.51% 0.51% 87862150 87870890 87870890 InterceptVectorsUF
I don't think there's anything really interesting there. However...
Worst render frame does this:
Code: Select all
Time spent in profile = 1.67425s.
Visits/calls:
97.38% 97.61% 52412 52536 correct_element
0.62% 334 init_font
0.59% 319 R_AdvanceSurface_flatshade
Executed instructions:
34.67% 34.77% 35.98% 995904 998743 1033627 correct_element
21.16% 21.19% 21.59% 607832 608730 620228 D_FlatMipGen_24_16
10.86% 10.88% 44.81% 312006 312447 1287281 create_local_palette_64levels
8.74% 8.76% 9.00% 251189 251778 258541 D_TextureRemapPixelSubBlock_16_8
8.45% 8.47% 12.22% 242673 243195 350951 D_FlatMipRemap_16_8
6.84% 6.86% 7.02% 196634 197010 201770 D_TextureRemapInit
3.19% 91581 R_AdvanceSurface_flatshade
2.28% 2.29% 2.29% 65624 65664 65664 _BM_A_Mux1x2
1.08% 1.09% 1.13% 31158 31220 32344 D_PatchTargetGetApproxRGB
0.70% 0.70% 0.70% 20042 20053 20053 D_FlatPackMipPages
0.69% 0.69% 2.81% 19811 19873 80631 D_FlatCLUTGen_32x32
...
Instruction cache misses:
34.38% 17130 correct_element
33.06% 33.11% 66.47% 16473 16498 33121 D_FlatCLUTGen_32x32
4.74% 5.09% 15.29% 2362 2534 7617 create_local_palette_64levels
4.35% 4.35% 7.68% 2167 2167 3826 _subframe_block
4.04% 2013 init_font
3.29% 3.33% 3.33% 1637 1659 1659 _BM_A_Mux1x2
2.53% 2.53% 2.53% 1261 1261 1261 _frame_event
2.29% 3.16% 5.20% 1142 1574 2592 D_FlatMipGen_24_16
1.55% 1.55% 11.76% 773 773 5860 _audio_mux_frame
Seems I hit something I hadn't hit earlier?
Is there something in instruction cache misses that could be "easily" improved? Cache miss callgraph is attached, and profile data looks like this:
Code: Select all
correct_element:
$055f1a : mulu.w d5,d7 1.82% (52416, 1481468, 5389)
$055f1c : neg.l d7 1.82% (52416, 209720, 15)
$055f1e : add.l #$ffff,d7 1.82% (52416, 433576, 3170)
$055f24 : move.l d7,d6 1.82% (52416, 209752, 10)
$055f26 : mulu.l d7,d7 1.82% (52416, 2517336, 0)
$055f2a : clr.w d7 1.82% (52416, 209720, 3166)
$055f2c : swap d7 1.82% (52416, 212536, 2)
$055f2e : mulu.w $55f5c(pc),d7 1.82% (52416, 1888352, 0)
$055f32 : move.w #$100,d4 1.82% (52416, 209664, 0)
$055f36 : sub.w $55f5c(pc),d4 1.82% (52416, 421232, 0)
$055f3a : mulu.w d4,d6 1.82% (52416, 1473040, 1081)
$055f3c : add.l d6,d7 1.82% (52416, 205452, 7)
$055f3e : lsr.l #8,d7 1.82% (52416, 214524, 1061)
$055f40 : neg.l d7 1.82% (52416, 209888, 2)
$055f42 : add.l #$ffff,d7 1.82% (52416, 424016, 1061)
$055f48 : bpl.s $55f4c 1.82% (52416, 420008, 0)
[...]
$055f4c : cmp.l #$ffff,d7 1.82% (52416, 419884, 0)
$055f52 : bmi.s $55f5a 1.82% (52416, 424068, 1063)
[...]
$055f5a : rts 1.82% (52416, 638232, 1103)
...
D_FlatCLUTGen_32x32:
$04f8fc : lea $800(a2),a2 0.00% (1, 8, 1)
$04f900 : moveq #$1f,d7 0.00% (1, 4, 0)
$04f902 : move.w d7,-(sp) 0.00% (32, 256, 16)
$04f904 : moveq #0,d5 0.00% (32, 24, 0)
$04f906 : move.w d7,d5 0.00% (32, 128, 6)
$04f908 : lsl.w #8,d5 0.00% (32, 128, 0)
$04f90a : divu.w #$1f,d5 0.00% (32, 1536, 0)
$04f90e : move.w #$1f,d6 0.00% (32, 128, 0)
$04f912 : movea.l a1,a4 0.00% (32, 128, 32)
$04f914 : lea $ffc0(a2),a2 0.00% (32, 256, 32)
$04f918 : movea.l a2,a3 0.00% (32, 128, 0)
$04f91a : move.w d6,-(sp) 0.04% (1024, 8192, 3008)
$04f91c : moveq #0,d7 0.04% (1024, 4096, 0)
$04f91e : move.b (a4)+,d7 0.04% (1024, 12288, 1024)
$04f920 : bsr $55f1a 0.04% (1024, 16384, 1024)
$04f924 : move.w d7,d1 0.04% (1024, 4096, 2048)
$04f926 : moveq #0,d7 0.04% (1024, 4208, 1024)
$04f928 : move.b (a4)+,d7 0.04% (1024, 8192, 0)
$04f92a : bsr $55f1a 0.04% (1024, 8192, 0)
$04f92e : move.w d7,d2 0.04% (1024, 4096, 3072)
$04f930 : moveq #0,d7 0.04% (1024, 4152, 0)
$04f932 : move.b (a4)+,d7 0.04% (1024, 12288, 1026)
$04f934 : bsr $55f1a 0.04% (1024, 16384, 1024)
$04f938 : move.w d7,d3 0.04% (1024, 4096, 2048)
$04f93a : bfins d1,d4{16:16} 0.04% (1024, 12352, 0)
$04f93e : bfins d2,d4{21:16} 0.04% (1024, 12344, 0)
$04f942 : bfins d3,d4{27:16} 0.04% (1024, 12288, 0)
$04f946 : move.w d4,(a3)+ 0.04% (1024, 8192, 1024)
$04f948 : move.w (sp)+,d6 0.04% (1024, 8192, 0)
$04f94a : dbra d6,$4f91a 0.04% (1024, 8320, 0)
$04f94e : move.w (sp)+,d7 0.00% (32, 384, 32)
$04f950 : dbra d7,$4f902 0.00% (32, 392, 32)
$04f954 : rts
Worst thinking frame does this:
Code: Select all
Time spent in profile = 0.26271s.
...
Visits/calls:
40.36%1638.19% 2364 95949 _P_RecursiveSound
6.54% 7.00% 383 410 _PIT_CheckLine
6.20% 13.45% 363 788 _P_CheckSight
...
Executed instructions:
25.59% 25.65%1046.64% 111765 112007 4570743 _P_RecursiveSound
18.80% 18.84% 68.44% 82082 82280 298864 _P_RunThinkers
10.96% 10.96% 10.96% 47860 47857 47857 * _BM_P_CrossBSPNode
8.93% 8.93% 8.93% 38979 39000 39000 _R_PointInSubsector
5.98% 5.99% 14.84% 26102 26156 64799 _P_CheckPosition
4.05% 4.06% 13.17% 17688 17726 57506 _P_LookForPlayers
3.07% 3.07% 3.07% 13400 13407 13407 _PIT_AddLineIntercepts_S
2.71% 2.81% 13.77% 11832 12271 60128 _BM_P_CheckSight
2.58% 2.59% 2.59% 11275 11295 11295 _BM_A_Mux1x2
2.26% 2.27% 5.78% 9873 9908 25224 _P_PathTraverse
2.04% 2.05% 2.05% 8910 8937 8937 _P_UpdateSpecials
1.41% 1.42% 1.78% 6179 6210 7794 _PIT_CheckLine
1.41% 1.41% 32.84% 6136 6136 143411 _P_SetMobjState
0.97% 0.98% 0.98% 4241 4262 4262 _PIT_CheckThing
0.82% 0.82% 17.15% 3575 3582 74909 _P_TryMove
0.78% 0.79% 17.59% 3421 3448 76829 _A_Look
0.66% 0.66% 14.43% 2904 2904 63032 _P_CheckSight
...
Sound recursion taking 1/4 of CPU is a bit surprising. It's called from A_ReFire(), callgraph is attached, and the code is below:
Code: Select all
_P_RecursiveSound:
$02e4f0 : movem.l d2-d3/a2-a3,-(sp) 0.54% (2364, 94584, 6)
$02e4f4 : movea.l $14(sp),a3 0.54% (2364, 28448, 6)
$02e4f8 : move.w $1a(sp),d3 0.54% (2364, 18940, 7)
$02e4fc : move.w $8eee8,d0 0.54% (2364, 18912, 0)
$02e502 : cmp.w $40(a3),d0 0.54% (2364, 18968, 0)
$02e506 : bne.s $2e514 0.54% (2364, 9556, 7)
$02e508 : movea.w $12(a3),a1 0.41% (1793, 14368, 6)
$02e50c : movea.w d3,a0 0.41% (1793, 8, 0)
$02e50e : addq.l #1,a0 0.41% (1793, 7172, 4)
$02e510 : cmpa.l a1,a0 0.41% (1793, 7228, 0)
$02e512 : bge.s $2e584 0.41% (1793, 14364, 7)
$02e514 : move.w d0,$40(a3) 0.13% (571, 4596, 7)
$02e518 : move.w d3,d0 0.13% (571, 2272, 0)
$02e51a : addq.w #1,d0 0.13% (571, 2284, 10)
$02e51c : move.w d0,$12(a3) 0.13% (571, 4608, 10)
$02e520 : move.l $2bfb68,$14(a3) 0.13% (571, 11500, 10)
$02e528 : clr.w d2 0.13% (571, 2284, 0)
$02e52a : cmp.w $4c(a3),d2 0.13% (571, 4624, 0)
$02e52e : bge.s $2e584 0.13% (571, 2288, 1)
$02e530 : movea.w d2,a1 0.95% (4130, 16556, 12)
$02e532 : movea.l $50(a3),a0 0.95% (4130, 49560, 0)
$02e536 : movea.l (a0,a1.l*4),a2 0.95% (4130, 49672, 0)
$02e53a : movea.l $32(a2),a1 0.95% (4130, 49676, 0)
$02e53e : tst.l a1 0.95% (4130, 16520, 5)
$02e540 : beq.s $2e57c 0.95% (4130, 22908, 0)
$02e542 : movea.l $2e(a2),a0 0.58% (2530, 30416, 0)
$02e546 : move.l (a0),d0 0.58% (2530, 30488, 4)
$02e548 : cmp.l 4(a1),d0 0.58% (2530, 30376, 4)
$02e54c : bge.s $2e57c 0.58% (2530, 10120, 0)
$02e54e : move.l 4(a0),d0 0.55% (2402, 28824, 0)
$02e552 : cmp.l (a1),d0 0.55% (2402, 28840, 4)
$02e554 : ble.s $2e57c 0.55% (2402, 9756, 0)
$02e556 : move.l a0,d0 0.54% (2365, 9460, 4)
$02e558 : cmpa.l d0,a3 0.54% (2365, 9516, 0)
$02e55a : bne.s $2e55e 0.54% (2365, 14204, 4)
$02e55c : move.l a1,d0 0.27% (1183, 4788, 0)
$02e55e : btst #6,$11(a2) 0.54% (2365, 28396, 4)
$02e564 : beq.s $2e570 0.54% (2365, 18908, 0)
$02e566 : tst.w d3 0.00% (3, 12, 2)
$02e568 : bne.s $2e57c 0.00% (3, 24, 0)
[...]
$02e570 : movea.w d3,a0 0.54% (2362, 9448, 7)
$02e572 : move.l a0,-(sp) 0.54% (2362, 28456, 1)
$02e574 : move.l d0,-(sp) 0.54% (2362, 28400, 0)
$02e576 : bsr $2e4f0 0.54% (2362, 18896, 0)
$02e57a : addq.l #8,sp 0.54% (2362, 9448, 16)
$02e57c : addq.w #1,d2 0.95% (4130, 16576, 5)
$02e57e : cmp.w $4c(a3),d2 0.95% (4130, 33152, 0)
$02e582 : blt.s $2e530 0.95% (4130, 16584, 10)
$02e584 : movem.l (sp)+,d2-d3/a2-a3 0.54% (2364, 113544, 18)
$02e588 : rts 0.54% (2364, 28480, 0)
You do not have the required permissions to view the files attached to this post.