Eero Tamminen wrote:Here's single frame rendering from Doom1, when the player goes down the stairs towards a door with transparent monster.
Thanks. There's a *lot* of useful new info here - some important recent changes such as these (below) haven't really shown up until now and I expected to see them sooner. The cache miss rates suggest I should do some more work there.
20.42% 20.42% 20.55% 1738 1738 1749 R_AddSpriteSpans
14.93% 14.93% 14.93% 1271 1271 1271 R_ViewTestSpriteLines
Eero Tamminen wrote:Note: post-processor shows init_font() calls because you've remove framecounter() symbol from doom.sym. In reality they're framecounter() calls.
Disabling the text console stuff has also broken the window resolution indicator which prints xres*yres when you resize the window - it's not controlled by the build flag so it shows a garbage rectangle now. It took me a while to figure out what that was. At first I thought it was Doom drawing a strange icon on resize
Eero Tamminen wrote: 28.27% 1334 lowerwall_skip
22.17% 22.17% 1046 1046 R_DoColumnPerspCorrect
10.72% 506 trans_skip
7.20% 340 midwall_skip
6.42% 303 upperwall_skip
I'll need to think about that one since it's unusual to see. It's probably a lot of upper/lower walls which technically pass the wall occlusion test but are 90% below/behind something else and those columns get skipped by per-column occlusion tests during the edge stepping process.
Eero Tamminen wrote: 59.35% 59.35% 59.35% 3851632 3851632 3851632 R_DoColumnPerspCorrect
23.52% 1526070 command_base
Sprites still use R_DoColumnPerspCorrect and don't occlude anything so they can really build up the cost of this function with many overlaps. I'm planning an alternate path for sprites with constant z and very little work except column top/bottom trimming.
Eero Tamminen wrote:
Here's then the part between render_end and render_begin i.e. "thinking", for next frame...
Code: Select all
Time spent in profile = 0.21728s.
14.97% 66.60% 1272 5658 _P_MobjThinker
10.50% 10.50% 892 892 _P_PointOnDivlineSide
6.75% 7.27% 573 618 _PIT_CheckLine
4.14% 14.38% 352 1222 _PIT_AddLineIntercepts
3.77% 3.77% 320 320 _PIT_CheckThing
3.67% 312 copy16_d
3.38% 10.79% 287 917 _P_BlockThingsIterator
2.60% 3.15% 221 268 BM_P_CrossSubsector
2.41% 205 copy16
P_PointOnDivlineSide is a definite DSP candidate. PIT_CheckLine, PIT_AddLineIntercepts, PIT_CheckThing I think are the collision raytest/clipping and probably also DSP candidates in there.
copy16* I *suspect* is from memcpy calls made from the V_CopyRect dirty rectangle updates for the status bar. That's a guess though.
Eero Tamminen wrote:...
12.08% 12.38% 20.44% 40126 41135 67932 _BM_P_CrossBSPNode
9.52% 9.53% 80.59% 31620 31674 267794 _P_MobjThinker
8.02% 8.04% 8.04% 26664 26705 26705 _P_PointOnDivlineSide
7.51% 7.51% 8.06% 24947 24969 26797 BM_P_CrossSubsector
6.82% 6.82% 6.82% 22655 22669 22669 _R_PointInSubsector
Ouch. More still to do here it seems!
Eero Tamminen wrote: 10.98% 11.00% 33.89% 12793 12822 39499 _P_BlockLinesIterator
Not sure what this is yet. Something to do with entities and the blockmap. There is some kind of AI interaction with blockmap (grid of cells indicating overlap with some other stuff - sectors or walls IIRC. I don't use it in BM engine).
Eero Tamminen wrote:DSP side:[code]
96.06% 1.67% 63231 1102 P_CrossSubsector
I was going to say 'wow' but P_CrossSubsector is inlined as part of another loop and these are relative measurements for label visits. Still, it's a lot.
Eero Tamminen wrote: 1.20% 1.20% 787 787 TestLineSegVectorBisection
0.86% 567 P_CrossSubsector_doseg
0.74% 1.46% 484 964 InterceptVectors
0.73% 0.73% 480 480 Divs48_Real
Thankfully these ^^^ are being shortcut most of the time, because they are expensive. Divs48_Real is my last-resort floating point divide
for when no other divide is enough.
Eero Tamminen wrote:DSP side:[code]
11.43% 877 lowerwall_skip
10.08% 10.08% 774 774 R_DoColumnPerspCorrect
8.32% 639 upperwall_skip
7.18% 551 command_base
4.68% 359 midwall_skip
3.44% 264 R_CheckBBox_end
3.44% 5.33% 264 409 R_CheckBBox
Again, interesting to see so much skipping. Walls passing occlusion test, but most of the columns being 0 or less high and getting skipped.
Eero Tamminen wrote:Significantly less DSP free than with Doom1 during the whole frame.
=> In Doom1 code, thinking takes slightly more CPU than rendering, in Doom2 rendering takes more time.
I'm going to speculate from the above that Doom2's extensive use of open areas (many sector joins with scene depth) and floor height changes is causing the occlusion testing to work ineffectively, and lots of wall scanning is going on between the player's eye and the first solid surface, without much drawing resulting in between.
This could probably be solved with a different kind of occlusion testing but I'd have to think quite hard about that before mucking about with that now.
Thanks for the detailed reports I'll be digesting it all