Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Thu Jun 13, 2013 9:12 pm

dml wrote:However it does imply being stuck with a 320x168+40 display (the window can now be shrunk with +/-) since no other mode works with status bar and menus yet.


No problem. :)

dml wrote:I suspect the slowdown is a combination of very slow patch-based character drawing and unnecessary backbuffer blits. I figured it wasn't worth spending a lot of time on trying to fix now (esp. with audio, weapons etc. missing) but can be visited later, maybe with Laurent's compiled sprites for the font characters.


If my opinion matters in this at all, I would say that having first fairly complete game and finding out how much that slows it down would be nicer as game speed is fairly playable already.


dml wrote:I don't remember ever needing those messages in the game, although they can be amusing in network play.


Knowing in fight (when one doesn't necessarily have time to check statusbar stats) when one picked an item like superarmor could be useful, but it doesn't need to be done with messages. If changing border color be much cheaper, blinking the border with suitable color would be enough. Getting hit should have priority over item pickup in border color/palette changes though.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Jun 14, 2013 10:56 am

Eero Tamminen wrote:If my opinion matters in this at all, I would say that having first fairly complete game and finding out how much that slows it down would be nicer as game speed is fairly playable already.


I have added a makefile option -DALLOW_PLAYER_MESSAGES to control this - it is currently disabled because the cost is disproportional to everything else at the moment and it displays incorrectly if the window is shrunk. Note that I haven't hacked anything permanently out of the game code it's all still there. :)

Eero Tamminen wrote:Knowing in fight (when one doesn't necessarily have time to check statusbar stats) when one picked an item like superarmor could be useful, but it doesn't need to be done with messages. If changing border color be much cheaper, blinking the border with suitable color would be enough. Getting hit should have priority over item pickup in border color/palette changes though.


I would tend to agree. The player messages can probably be fixed by reimplementing the font drawing, inhibiting dirty rectangles on some items (like these messages, esp. if they are drawn to every frame) plus some other changes if we want them back.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Fri Jun 14, 2013 12:11 pm

Eero, all those changes have been pushed this morning. While doing so I found a new lighting related bug related to surface deferral, per-surface lighting and sprites but it shouldn't get in your way.

[EDIT]

Bugs now fixed too.

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 7:47 am

dml wrote:Eero, all those changes have been pushed this morning.


Thanks here are some additional 100 frames profiles.

Doom1, from first shot:

Code: Select all

Time spent in profile = 29.95044s.
...
Calls:
  13.94%  36.51%      111023    290874   _P_MobjThinker
   8.03%               63974             R_BSPHyperPlane_RHS
   7.95%               63326             R_BSPHyperPlane_LHS
   5.19%               41347             copy16_d
   3.26%   3.54%       25969     28196   _PIT_CheckLine
   2.34%   2.89%       18642     23014   BM_P_CrossSubsector
   1.89%   1.89%       15021     15021   _SwapLONG
   1.84%   4.43%       14652     35306   _P_BlockThingsIterator
   1.75%   1.76%       13917     14016   _PIT_CheckThing
...
Executed instructions:
  35.95%                    19993875                       render_wall_1x1
   7.61%                     4234369                       R_VisPlaneShader
   6.26%                     3479370                       render_transparent_1x1
   5.72%   5.86%  10.17%     3180812   3256765   5656704   _BM_P_CrossBSPNode
   4.95%   4.95%  27.84%     2750820   2754400  15485864   _P_MobjThinker
   4.00%   4.00%   4.32%     2226388   2225983   2400095 * BM_P_CrossSubsector
   3.03%   3.04%   3.04%     1687399   1688318   1688318   _R_PointInSubsector
   2.75%   2.75%  31.15%     1529374   1530787  17324686   _P_RunThinkers
   2.44%   2.44%   3.17%     1355946   1357479   1761160   _V_DrawPatch
   1.79%   1.80%   1.80%      996688    999537    999537   stream_texture
   1.40%   1.41%   1.41%      779758    785088    785088   stack_visplane_area
   1.30%   1.30%   2.57%      724578    725723   1431523   _P_BlockLinesIterator
   1.22%   1.22%  11.98%      678444    679562   6663371   _P_LookForPlayers
   1.15%   1.15%   6.16%      640242    642329   3426405   _P_CheckPosition
   1.10%   1.10%  11.27%      611359    613191   6265747   _BM_P_CheckSight
...
Instruction cache misses:
  11.25%  11.27%  60.43%      843031    844947   4528987   _P_MobjThinker
  11.14%  11.15%  73.57%      834790    835371   5513926   _P_RunThinkers
   5.69%   5.70%  12.30%      426559    427171    921582   _P_BlockLinesIterator
   4.51%   4.52%  12.88%      337946    338494    965151   _BM_P_CheckSight
   4.09%   4.09%   4.57%      306181    306409    342192   _PIT_CheckLine
   3.86%   3.87%  18.20%      288935    289821   1364233   _P_CheckPosition
   3.39%   3.98%   8.37%      253989    298592    627137   _BM_P_CrossBSPNode
   3.26%   3.28%   4.38%      244370    245710    328474   BM_P_CrossSubsector
   3.09%   3.10%   3.10%      231484    232374    232374   _R_PointInSubsector
   2.86%   2.86%  15.61%      214002    214595   1170323   _P_LookForPlayers
   2.80%   2.80%  41.04%      209610    209886   3075801   _P_SetMobjState
   1.90%   1.90%  22.40%      142339    142589   1678973   _P_TryMove
   1.63%   1.64%   2.70%      122327    123170    202129   _V_DrawPatch
   1.57%   1.58%   4.68%      117968    118318    350860   _P_BlockThingsIterator
   1.49%   1.50%   1.55%      111974    112105    116454   R_AddSpriteSpans
   1.46%   1.46%   1.48%      109290    109485    111285   _PIT_CheckThing
   1.37%   1.37%   2.65%      102598    102868    198724   _P_SetThingPosition
   1.15%   1.16%  19.29%       86491     86660   1445814   _A_Chase
   1.14%   1.15%   1.15%       85626     85901     85901   R_ViewTestSpriteLines
   1.10%   1.11%   1.11%       82718     82930     82930   BM_P_CrossSubsector_vdelta
   1.07%   1.07%  17.34%       80280     80416   1299726   _A_Look


DSP side:

Code: Select all

Calls:
  90.83%   1.56%     5608460     96241   P_CrossSubsector
   1.11%   1.11%       68802     68802   TestLineSegVectorBisection
..
Used cycles:
  42.81%                   411418182                       command_base
  24.30%  24.30%  24.30%   233520730 233520730 233520730   R_DoColumnPerspCorrect
  10.35%                    99449026                       VPRenderPlaneDT_
   5.59%                    53726628                       ALGO_P_CrossBSPNode
   3.60%   4.81%   5.25%    34605384  46212916  50451124   P_CrossSubsector
   3.13%                    30101568                       R_VPLoadTexture
   1.58%   1.58%   1.58%    15163084  15163084  15163084   extract_subvisplane
   1.20%                    11511150                       P_CrossSubsector_doseg
   1.09%                    10487546                       R_ViewTestAddLine
   0.80%   0.80%   0.80%     7650564   7650564   7650564   Divs48_Real



Doom1, from first barrel explosion:

Code: Select all

Time spent in profile = 33.94979s.
...
Calls:
  13.34%  41.15%      120153    370599   _P_MobjThinker
   7.13%               64250             R_BSPHyperPlane_LHS
   6.60%   7.31%       59419     65840   _PIT_CheckLine
   6.01%               54137             R_BSPHyperPlane_RHS
   4.64%               41793             copy16_d
   3.22%   3.23%       29027     29111   _PIT_CheckThing
   2.94%   8.24%       26456     74223   _P_BlockThingsIterator
   1.77%   2.18%       15904     19648   BM_P_CrossSubsector
   1.69%   1.69%       15198     15198   _SwapLONG
   1.49%   1.49%       13442     13442   _memmove
   1.49%               13442             exit_4
   1.49%               13441             common2
   1.49%               13438             none
   1.49%               13437             common
   1.48%   1.48%       13368     13368   _SwapSHORT
   1.48%  24.11%       13328    217124   _P_SetMobjState
   1.37%  10.40%       12329     93639   _P_BlockLinesIterator
...
Executed instructions:
  38.77%                    24918387                       render_wall_1x1
   7.06%                     4535561                       render_transparent_1x1
   4.77%                     3067044                       R_VisPlaneShader
   4.63%   4.64%  28.15%     2978507   2981856  18096705   _P_MobjThinker
   4.34%   4.44%   7.54%     2789547   2854047   4849469   _BM_P_CrossBSPNode
   4.08%   4.09%   4.09%     2625187   2627746   2627746   _R_PointInSubsector
   2.87%   2.87%   3.11%     1844896   1846255   1996037   BM_P_CrossSubsector
   2.59%   2.59%  31.93%     1665032   1665437  20526537   _P_RunThinkers
   2.28%   2.28%   4.59%     1464849   1466863   2948707   _P_BlockLinesIterator
   2.12%   2.12%   2.76%     1363742   1365349   1774574   _V_DrawPatch
   1.71%   1.72%  10.25%     1101380   1104347   6587090   _P_CheckPosition
   1.45%   1.45%   1.45%      931968    934162    934162   stream_texture
   1.36%   1.36%   3.50%      874133    875427   2251784   _P_BlockThingsIterator
   1.29%   1.29%   1.56%      829408    830895   1002670   _PIT_CheckLine
   1.08%   1.08%   8.89%      695902    696843   5712980   _P_LookForPlayers
   0.97%   0.97%   0.98%      625213    626091    628929   _PIT_CheckThing
...
Instruction cache misses:
   9.62%   9.64%  61.90%      922271    923998   5932125   _P_MobjThinker
   9.47%   9.47%  75.01%      907568    907598   7188605   _P_RunThinkers
   9.16%   9.17%  20.26%      877598    878588   1942038   _P_BlockLinesIterator
   7.42%   7.42%   8.54%      710759    711439    818782   _PIT_CheckLine
   5.21%   5.23%  28.83%      499764    501128   2762979   _P_CheckPosition
   3.88%   3.90%   3.90%      372198    373712    373712   _R_PointInSubsector
   3.40%   3.41%   9.10%      325841    326374    871682   _BM_P_CheckSight
   2.45%   2.46%  44.86%      235176    235611   4298916   _P_SetMobjState
   2.29%   2.70%   5.69%      219231    258355    545468   _BM_P_CrossBSPNode
   2.27%   2.27%   7.82%      217116    217842    749573   _P_BlockThingsIterator
   2.24%   2.25%   3.00%      214857    215876    287168   BM_P_CrossSubsector
   2.22%   2.23%  10.93%      212755    213257   1047759   _P_LookForPlayers
   2.16%   2.17%  32.01%      207351    207772   3067453   _P_TryMove
   2.05%   2.05%   2.07%      196433    196882    198323   _PIT_CheckThing
   1.52%   1.53%   1.58%      145893    146227    151307   R_AddSpriteSpans
   1.48%   1.48%   2.84%      141431    141642    272033   _P_SetThingPosition
   1.41%   1.41%   1.41%      134723    134871    134871   _P_PointOnDivlineSide
   1.33%   1.33%  29.37%      127789    127862   2814464   _A_Chase
   1.30%   1.31%   2.15%      124527    125348    205601   _V_DrawPatch
   1.03%   1.03%   1.03%       98327     98549     98549   R_ViewTestSpriteLines
   1.02%   1.02%   2.56%       97788     98059    245138   _PIT_AddLineIntercepts
   0.91%   0.91%   0.91%       87278     87346     87346   _P_UnsetThingPosition
   0.89%   0.90%  26.30%       85616     85868   2520808   _P_Move


(In cycles _BM_P_CrossBSPNode & R_VisPlaneShader are higher than render_transparent_1x1.)

DSP side:

Code: Select all

Calls:
  88.50%   1.52%     4710505     80673   P_CrossSubsector
   1.71%               90904             lowerwall_skip
   1.19%   1.19%       63382     63382   R_DoColumnPerspCorrect
   1.08%   1.08%       57264     57264   TestLineSegVectorBisection
...
Used cycles:
  47.94%                   522208128                       command_base
  26.84%  26.84%  26.84%   292318754 292318754 292318754   R_DoColumnPerspCorrect
   6.52%                    71015000                       VPRenderPlaneDT_
   4.32%                    47102348                       ALGO_P_CrossBSPNode
   2.67%   3.55%   3.87%    29063934  38616684  42167484   P_CrossSubsector
   2.58%                    28127388                       R_VPLoadTexture
   1.14%   1.46%   1.73%    12388140  15874262  18862838   AddLowerWall
   1.00%   1.00%   1.00%    10887778  10887778  10887778   extract_subvisplane
   0.97%                    10598500                       R_ViewTestAddLine
   0.87%                     9470316                       P_CrossSubsector_doseg
   0.69%   0.82%  19.96%     7483598   8889780 217439208   AddMidWall
   0.63%   0.63%   0.63%     6821844   6821844   6821844   Divs48_Real
   0.53%   0.69%   7.74%     5796688   7471534  84262150   AddUpperWall


=> Average FPS with WinUAE CPU core & GCC 2.9.5 compiled code is 2.94 FPS.

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 8:09 am

The statusbar really adds feel to the game, the player face animation etc.

Here's profile stuff for Doom2, 100 frames from first shot:

Code: Select all

Time spent in profile = 39.87111s.
...
Calls:
  11.61%              143396             R_BSPHyperPlane_RHS
   9.39%              115932             R_BSPHyperPlane_LHS
   8.99%  35.12%      111067    433822   _P_MobjThinker
   5.68%   5.69%       70202     70231   _PIT_CheckThing
   3.55%   9.44%       43880    116597   _P_BlockThingsIterator
   3.39%               41824             copy16_d
   2.00%   2.96%       24703     36589   _PIT_CheckLine
   1.80%   2.36%       22198     29129   BM_P_CrossSubsector
   1.46%               18048             R_AddLine_loop
   1.46%               18045             R_AddLine_invisible
   1.39%   1.39%       17138     17138   _R_PointInSubsector
   1.14%  15.04%       14088    185834   _P_CheckPosition
   1.14%  16.91%       14083    208886   _P_TryMove
   1.11%   1.11%       13663     13663   _SwapLONG
...
Executed instructions:
  30.97%                    22423908                       render_wall_1x1
   7.38%                     5343922                       R_VisPlaneSkyShader
   6.48%   6.49%   6.49%     4693636   4698603   4698603   _R_PointInSubsector
   4.72%   4.82%   9.04%     3414306   3487777   6546944   _BM_P_CrossBSPNode
   3.89%                     2817930                       R_VisPlaneShader
   3.84%   3.84%   4.23%     2782285   2781724   3059767 * BM_P_CrossSubsector
   3.70%   3.70%  32.49%     2678514   2681865  23522139   _P_MobjThinker
   2.38%   2.38%  14.14%     1720782   1724666  10238294   _P_CheckPosition
   2.24%   2.24%   2.24%     1621508   1624425   1625257   _PIT_CheckThing
   2.16%   2.16%   4.53%     1560941   1563307   3276878   _P_BlockThingsIterator
   2.10%   2.10%  34.73%     1520267   1522489  25143315   _P_RunThinkers
   1.85%   1.87%   1.87%     1336021   1350661   1350661   stack_visplane_area
   1.78%   1.78%   1.78%     1287928   1287743   1287743 * stream_texture
   1.64%   1.65%   2.17%     1190529   1192088   1568510   _V_DrawPatch
   1.41%                     1023310                       render_transparent_1x1
   1.11%   1.11%   1.11%      803152    803194    803194   _P_UpdateSpecials
   1.06%   1.07%   2.54%      770050    771319   1840368   _P_BlockLinesIterator
...
Instruction cache misses:
   7.60%   7.62%  59.31%      848602    850402   6621867   _P_MobjThinker
   7.42%   7.43%  27.60%      828140    829926   3081975   _P_CheckPosition
   7.30%   7.31%  66.87%      815072    816048   7466448   _P_RunThinkers
   5.16%   5.18%   5.18%      575683    578327    578327   _R_PointInSubsector
   4.12%   4.13%  10.08%      460128    460789   1125041   _P_BlockLinesIterator
   3.89%   3.91%   8.13%      434833    436003    907806   _P_BlockThingsIterator
   3.83%   3.85%   3.85%      427976    429534    430026   _PIT_CheckThing
   2.96%   2.96%   4.61%      330308    330683    514506   _PIT_CheckLine
   2.63%   2.64%   2.72%      293849    294417    303334   R_AddSpriteSpans
   2.62%   2.98%   6.57%      292572    333045    734026   _BM_P_CrossBSPNode
   2.59%   2.60%   9.17%      289570    290183   1024090   _BM_P_CheckSight
   2.40%   2.41%   3.59%      267914    269344    400886   BM_P_CrossSubsector
   2.35%   2.36%  32.26%      262896    263288   3602212   _P_TryMove
   1.73%   1.74%  46.56%      193528    193865   5198499   _P_SetMobjState
   1.73%                      192810                       build_ssector
   1.69%   1.69%   1.69%      188660    189155    189155   R_ViewTestSpriteLines
   1.64%   1.65%  29.49%      183636    184190   3292734   _P_Move
   1.57%                      174989                       render_wall_1x1
   1.33%                      148106                       R_AddLine_loop
   1.27%   1.27%   8.86%      141740    142067    989333   _P_LookForPlayers
   1.17%   1.18%   1.18%      131174    131819    131819   BM_P_CrossSubsector_vdelta
   1.13%   1.13%  25.64%      125960    126108   2862414   _P_NewChaseDir
   1.07%                      119730                       R_AddLine_prelight
   1.07%   1.08%   1.77%      119689    120426    197624   _V_DrawPatch
   0.98%   0.98%  24.42%      109246    109348   2726608   _P_TryWalk
   0.95%   0.95%   4.48%      106365    106557    500710   R_FlushDeferredSurfaces


DSP side:

Code: Select all

Calls:
  88.12%   1.47%     7110659    118282   P_CrossSubsector
   1.02%   1.02%       82260     82260   TestLineSegVectorBisection
   0.93%               75056             lowerwall_skip
   0.88%               70801             R_EndAddSurface
   0.84%   0.84%       68055     68055   R_DoColumnPerspCorrect
   0.79%               64020             P_CrossSubsector_doseg
   0.66%               53610             upperwall_skip
   0.63%               50954             command_base
   0.62%   1.23%       49786     99025   InterceptVectors
   0.61%   0.61%       49332     49332   Divs48_Real

Used cycles:
  45.53%                   582502260                       command_base
  20.13%  20.13%  20.13%   257568156 257568156 257568156   R_DoColumnPerspCorrect
   5.67%                    72564298                       R_VPRenderSky
   4.85%                    62028210                       VPRenderPlaneDT_
   4.35%                    55608642                       ALGO_P_CrossBSPNode
   3.43%   4.56%   4.99%    43851300  58338284  63831620   P_CrossSubsector
   3.04%                    38872398                       R_VPLoadTexture
   2.20%                    28117066                       R_ViewTestAddLine
   1.99%   1.99%   1.99%    25519674  25519674  25519674   extract_subvisplane
   1.12%                    14334150                       P_CrossSubsector_doseg
   0.91%                    11586558                       project_node
   0.90%   1.14%   5.80%    11496606  14566804  74239060   AddLowerWall
   0.75%   0.75%   0.75%     9582008   9582008   9582008   Divs48_Real
   0.64%   0.79%  15.45%     8194038  10118832 197598716   AddMidWall
   0.57%                     7287198                       R_CheckBBoxPair
   0.56%   0.73%   1.85%     7146406   9282138  23686066   AddUpperWall


=> Average FPS 2.5.

It's a good thing that the levels are different in Doom1 & Doom2, as here one can see the sky handling taking several percentages both on CPU & DSP side.

Both with Doom1 & Doom2 a large part of DSP cycles is just waiting stuff at command base. I'll add separate profiling for rendering and thinking part for the last frame to the profiling script.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 8:23 am

Eero Tamminen wrote:Thanks here are some additional 100 frames profiles.


Good stuff. I'm currently revising this one:

24.30% 24.30% 24.30% 233520730 233520730 233520730 R_DoColumnPerspCorrect

...as I have been further simplifying textured column calculations in an effort to speed things up - on the DSP side at least, and *maybe* allow the wall texture mapping to be deferred until the scene end (and so reduce cache misses by doing all walls at once, in state-sorted order).

It relies on the assumption that every texel is 1 unit size in the world, and so has a direct relationship to scene depth or [z].

i.e. Since perspective projection is [scale = (1/z)], and texel rate is [1/scale], then texel rate is just [z], no other calcs needed. This greatly simplifies the texture v, dv calculation per column, which previously used something like [tex_vmag/(scan_y2-scan_y1)]. It could even remove the need to scan wall top/bottom edges completely, if occlusion buffer testing/filling wasn't involved, but that would need more reorganisation and I'm not sure it's worth it yet for other reasons. I've been wanting to make this change for a while but it was more difficult to arrange with the previous code.

Eero Tamminen wrote:(In cycles _BM_P_CrossBSPNode & R_VisPlaneShader are higher than render_transparent_1x1.)


The transparency rendering is costly by screen area, but on average it is very much a background cost. It has occasional strong spiking behaviour because it just takes one or two sprites to move near the viewplane for it to dominate.

Eero Tamminen wrote:=> Average FPS with WinUAE CPU core & GCC 2.9.5 compiled code is 2.94 FPS.


I'm not too worried about the FPS on the current config since it is running a full size 320x window and doesn't have AI mods enabled to keep it demo-playback-friendly i.e. close to worst case performance. I'll try to set something up so testing 'realistic' resolutions (maybe the JagDoom one since it worked pretty well) is easier via Makefile, even if the status bar looks wrong in that mode.

Not much I can do about the cost of PC-demo-friendly settings except more game code optimizations but there will be a limit to that area, fast approaching. Best solution is to get a 12Hz Falcon demo recorded so we can profile with the expected tick rate and not the PC one. I don't know if this works yet but since the menu is functioning now it's worth a shot. (I suspect the AI mods I have made could break even locally recorded demos but that would at least be fixable).

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 8:31 am

Eero Tamminen wrote:The statusbar really adds feel to the game, the player face animation etc.


:-)

V_DrawPatch and V_CopyRect are two which relate to status bar updates. However with the current config they are getting swamped and not really registering (i.e. constant cost). When configured for max performance they do become more dominant and reduce the FPS a bit.

Eero Tamminen wrote:It's a good thing that the levels are different in Doom1 & Doom2, as here one can see the sky handling taking several percentages both on CPU & DSP side.


The CPU side of the sky shader is still quite costly, partly because the texture format isn't optimized for direct drawing in 'fullbright' mode (i.e. no lighting indirection needed) and the texture format is column-major - the default for walls - when visplane (sky,ceiling,floor) shading occurs as row-major spans. I have left this for now to do other more important things but will return later.

Eero Tamminen wrote:Both with Doom1 & Doom2 a large part of DSP cycles is just waiting stuff at command base. I'll add separate profiling for rendering and thinking part for the last frame to the profiling script.


The time spent at command_base can be misleading - there are vast tracts of time where the DSP isn't really asked or expected to do anything, and it just sits there until it gets work to do. When it does get work it can still saturate. The only way to tell is the spin points on the CPU side start to register a high execution ratio vs previous opcodes. I've been checking these by eye, although not reviewed them for a while now and they have changed a lot...

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 8:37 am

Regarding V_DrawPatch(), Doom startup has changed interestingly, it goes now:

Code: Select all

Time spent in profile = 78.00320s.
...
Calls:
  44.58%  44.58%      776914    776914   strcmp_8
   8.11%   8.11%      141292    141292   correct_element
...
Executed instructions:
  21.33%  21.36%  21.82%    34407229  34457110  35197497   _V_DrawPatch
  14.40%  14.42%  14.42%    23236313  23258270  23258270   flat_generate_mips_8bitsrc
  14.27%  14.27%  14.27%    23011853  23026814  23026814   strcmp_8
  11.65%   9.41%   9.41%    18797358  15178238  15178238 * D_PatchRender_8_8
   4.63%   4.63%  17.13%     7467266   7465478  27627481 * hrsc_locate_q
   3.66%   3.66%   3.66%     5898360   5904792   5904792   D_TextureRemapInit
   3.44%   3.46%   4.27%     5553897   5577822   6886657   _BM_S_Init
...
Instruction cache misses:
  29.15%  29.13%  51.73%     2741956   2740194   4866441 * hrsc_locate_q
  22.45%  22.51%  22.51%     2112063   2117442   2117442   strcmp_8
   6.47%                      608676                       ROM_TOS
   4.08%                      383639                       _SwapLONG


=> V_DrawPatch() takes more CPU during startup than mipmap generation...

If this is also statusbar, why it's being drawn so much during startup? Is it drawn on every frame although it doesn't change?

Doom1:

Code: Select all

Time spent in profile = 45.71541s.
...
Calls:
  20.58%  20.58%      157215    157215   strcmp_8
  18.09%  18.09%      138219    138219   correct_element
   9.10%   9.10%       69502     69502   _SwapLONG
   6.28%               47958             copy16_d
   5.94%               45378             copy16
   4.90%   4.90%       37420     37420   _SwapSHORT
...
Executed instructions:
  23.78%  23.80%  23.80%    22462729  22484712  22484712   flat_generate_mips_8bitsrc
  16.66%  16.68%  17.05%    15733137  15758565  16102930   _V_DrawPatch
   9.81%   3.98%   3.98%     9270334   3757356   3757356 * D_PatchRender_8_8
   6.04%   6.04%   6.04%     5701748   5707974   5707974   D_TextureRemapInit
   5.88%   5.90%   7.29%     5553897   5577746   6886555   _BM_S_Init
   5.02%   5.03%   5.03%     4744045   4746639   4746639   strcmp_8
...
Instruction cache misses:
  12.96%  12.98%  23.59%      580409    581110   1056342   hrsc_locate_q
  12.09%                      541307                       ROM_TOS
  10.53%  10.56%  10.56%      471762    472745    472745   strcmp_8
   7.45%                      333556                       _SwapLONG
   6.96%   6.96%  12.46%      311567    311637    557945   create_palettes_64levels
   5.52%   5.56%   5.56%      247362    249071    249071   correct_element
   4.87%   4.89%  15.84%      218141    218864    709408   _R_GenerateLookup


Do the strcmp_8 lookups have the same optimization I did for C-side code?

(You might add some note about these to TODO list.)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 9:43 am

Eero Tamminen wrote:Regarding V_DrawPatch(), Doom startup has changed interestingly, it goes now:[code]
Time spent in profile = 78.00320s.
=> V_DrawPatch() takes more CPU during startup than mipmap generation...
If this is also statusbar, why it's being drawn so much during startup? Is it drawn on every frame although it doesn't change?


V_DrawPatch is a generic patch drawing routine belonging to the Doom game code (or 'virtual rendering' layer, not exactly device-specific although it is a bit since it assumes a chunky 8bit framebuffer).

So this is most probably the title screen being drawn on each refresh. It is quite slow.

Eero Tamminen wrote:Do the strcmp_8 lookups have the same optimization I did for C-side code?
(You might add some note about these to TODO list.)


This I can't remember - there are some string8 copying optimizations in there but I don't know if the string compare is one of them. I'll take a look. I'll update the todo accordingly.

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 10:05 am

Here's single frame rendering from Doom1, when the player goes down the stairs towards a door with transparent monster.

CPU side:

Code: Select all

Time spent in profile = 0.20226s.
...
Calls:
  31.86%                 496             R_BSPHyperPlane_RHS
  24.98%                 389             R_BSPHyperPlane_LHS
   3.15%                  49             R_AddLine_loop
   3.15%                  49             R_AddLine_invisible
   2.57%                  40             init_font
...
Executed instructions:
  64.98%                      311297                       render_wall_1x1
  21.03%                      100762                       render_transparent_1x1
   2.99%                       14319                       R_VisPlaneShader
   2.80%                       13408                       stack_transparent
...
Instruction cache misses:
  20.42%  20.42%  20.55%        1738      1738      1749   R_AddSpriteSpans
  14.93%  14.93%  14.93%        1271      1271      1271   R_ViewTestSpriteLines
   7.53%                         641                       render_wall_1x1
   5.43%                         462                       build_ssector
   4.41%                         375                       R_AddLine_loop
   4.36%                         371                       render_transparent_1x1
   4.02%   4.02%  20.83%         342       342      1773   R_FlushDeferredSurfaces
   3.24%                         276                       R_AddLine_prelight
   3.01%                         256                       init_font


Note: post-processor shows init_font() calls because you've remove framecounter() symbol from doom.sym. In reality they're framecounter() calls.

DSP side:

Code: Select all

Calls:
  28.27%                1334             lowerwall_skip
  22.17%  22.17%        1046      1046   R_DoColumnPerspCorrect
  10.72%                 506             trans_skip
   7.20%                 340             midwall_skip
   6.42%                 303             upperwall_skip
   3.01%                 142             command_base
...
Used cycles:
  59.35%  59.35%  59.35%     3851632   3851632   3851632   R_DoColumnPerspCorrect
  23.52%                     1526070                       command_base
   5.11%                      331376                       VPRenderPlaneDT_
   2.30%   3.08%   3.18%      149252    200170    206240   AddLowerWall
   1.50%                       97662                       R_VPLoadTexture
   1.10%                       71448                       R_ViewTestAddLine
   1.05%   1.26%  31.34%       68336     81768   2034002   AddMidWall


It's probably not the most demanding thing and there aren't that many monsters (alive), but command_base is <1/4 for the rendering part.

Here's then the part between render_end and render_begin i.e. "thinking", for next frame...

CPU side:

Code: Select all

Time spent in profile = 0.21728s.
...
Calls:
  14.97%  66.60%        1272      5658   _P_MobjThinker
  10.50%  10.50%         892       892   _P_PointOnDivlineSide
   6.75%   7.27%         573       618   _PIT_CheckLine
   4.14%  14.38%         352      1222   _PIT_AddLineIntercepts
   3.77%   3.77%         320       320   _PIT_CheckThing
   3.67%                 312             copy16_d
   3.38%  10.79%         287       917   _P_BlockThingsIterator
   2.60%   3.15%         221       268   BM_P_CrossSubsector
   2.41%                 205             copy16
   1.86%  23.52%         158      1998   _P_BlockLinesIterator
   1.74%   1.74%         148       148   _SwapLONG
   1.59%  46.46%         135      3947   _P_SetMobjState
   1.55%   1.55%         132       132   _SwapSHORT
...
Executed instructions:
  12.08%  12.38%  20.44%       40126     41135     67932   _BM_P_CrossBSPNode
   9.52%   9.53%  80.59%       31620     31674    267794   _P_MobjThinker
   8.02%   8.04%   8.04%       26664     26705     26705   _P_PointOnDivlineSide
   7.51%   7.51%   8.06%       24947     24969     26797   BM_P_CrossSubsector
   6.82%   6.82%   6.82%       22655     22669     22669   _R_PointInSubsector
   6.44%   6.45%  20.22%       21394     21439     67174   _P_BlockLinesIterator
   5.28%   5.29%  86.24%       17551     17572    286552   _P_RunThinkers
   4.00%   4.00%   5.21%       13286     13297     17310   _V_DrawPatch
   3.59%   3.60%  11.14%       11935     11946     37025   _PIT_AddLineIntercepts
   3.06%   3.06%   8.40%       10154     10174     27917   _P_BlockThingsIterator
   2.86%   2.87%  18.16%        9515      9529     60335   _P_CheckPosition
   2.28%   2.28%  22.72%        7571      7571     75503   _BM_P_CheckSight
   2.26%   2.26%   2.62%        7524      7512      8710 * _PIT_CheckLine
   2.22%   2.22%  24.55%        7370      7370     81569   _P_LookForPlayers
   2.07%   2.09%   2.09%        6889      6930      6930   _PIT_CheckThing
   1.69%   1.69%   1.69%        5620      5620      5620   _P_UpdateSpecials
   1.34%   1.35%   1.35%        4440      4488      4488   _P_InterceptVector
   1.16%   1.16%   3.25%        3862      3869     10813   _PIT_AddThingIntercepts
   1.13%   1.13%  65.36%        3741      3741    217193   _P_SetMobjState
   0.94%   0.95%  20.38%        3140      3160     67713   _P_PathTraverse
   0.80%   0.80%   0.80%        2664      2671      2671   _SwapLONG
...
Instruction cache misses:
  10.98%  11.00%  33.89%       12793     12822     39499   _P_BlockLinesIterator
   9.76%   9.77%   9.77%       11369     11389     11389   _P_PointOnDivlineSide
   8.46%   8.48%  82.67%        9855      9883     96347   _P_MobjThinker
   8.29%   8.30%  91.52%        9666      9675    106659   _P_RunThinkers
   6.31%   6.31%  16.30%        7352      7358     19002   _PIT_AddLineIntercepts
   5.90%   5.89%   6.59%        6878      6867      7675 * _PIT_CheckLine
   3.82%   3.83%  22.24%        4455      4461     25919   _P_CheckPosition
   3.54%   3.54%  10.10%        4120      4120     11771   _BM_P_CheckSight
   2.84%   2.84%   2.84%        3306      3312      3312   _R_PointInSubsector
   2.68%   2.69%   8.73%        3128      3139     10179   _P_BlockThingsIterator
   2.68%   3.20%   6.56%        3127      3735      7651   _BM_P_CrossBSPNode
   2.61%   2.62%   3.36%        3038      3050      3916   BM_P_CrossSubsector
   2.19%   2.21%   2.21%        2555      2578      2578   _P_InterceptVector
   2.08%   2.08%  12.30%        2420      2420     14330   _P_LookForPlayers


DSP side:

Code: Select all

Calls:
  96.06%   1.67%       63231      1102   P_CrossSubsector
   1.20%   1.20%         787       787   TestLineSegVectorBisection
   0.86%                 567             P_CrossSubsector_doseg
   0.74%   1.46%         484       964   InterceptVectors
   0.73%   0.73%         480       480   Divs48_Real

Used cycles:
  80.46%                     5608938                       command_base
   9.89%                      689700                       ALGO_P_CrossBSPNode
   5.60%   7.46%   8.14%      390326    519994    567628   P_CrossSubsector
   1.85%                      128634                       P_CrossSubsector_doseg
   1.31%   1.31%   1.31%       91534     91534     91534   Divs48_Real
   0.52%   0.52%   0.52%       36202     36202     36202   TestLineSegVectorBisection



Then Doom2, from the place where the flying demon goes away from the player...

Rendering, CPU side:

Code: Select all

Time spent in profile = 0.22608s.
...
Calls:
  29.92%                1961             R_BSPHyperPlane_RHS
  21.64%                1418             R_BSPHyperPlane_LHS
   3.54%                 232             R_AddLine_loop
   3.54%                 232             R_AddLine_invisible
   2.01%                 132             R_BSPHyperPlane
   1.83%   2.91%         120       191   render_wall
   1.62%                 106             R_PopBSPNode
   1.60%   1.60%         105       105   cache_resource
   1.59%   1.59%         104       104   add_wall_segment
...
Executed instructions:
  48.87%                      218557                       render_wall_1x1
  15.21%                       68008                       R_VisPlaneSkyShader
   5.48%                       24524                       R_VisPlaneShader
   2.98%   3.02%   3.02%       13317     13488     13488   stack_visplane_area
   2.89%   2.91%   2.91%       12944     12998     12998   stream_texture
   2.81%                       12553                       render_transparent_1x1
   2.13%   2.13%   2.20%        9504      9511      9847   R_AddSpriteSpans
   1.86%                        8324                       R_AddLine_loop
   1.74%                        7788                       R_BSPHyperPlane
   1.57%   1.57%   3.87%        7014      7014     17316   get_ssector
   1.34%   1.34%   1.34%        5972      5990      5990   R_ViewTestSpriteLines
   1.28%                        5718                       stack_transparent
   1.24%                        5532                       build_ssector
   1.02%   1.02%   1.02%        4576      4576      4576   add_wall_segment
   1.01%                        4518                       R_BSPHyperPlane_RHS
   0.92%   0.92%   0.92%        4100      4120      4120   initialise_freetable
   0.73%                        3273                       R_BSPHyperPlane_LHS
   0.69%   0.69%   0.69%        3079      3086      3086   init_stategroups
   0.58%   0.58%  52.52%        2589      2589    234881   R_FlushDeferredSurfaces
   0.56%                        2490                       R_AddLine_prelight
...
Instruction cache misses:
  14.23%  14.24%  14.79%        4520      4523      4699   R_AddSpriteSpans
   8.27%                        2626                       build_ssector
   7.73%   7.76%   7.76%        2457      2466      2466   R_ViewTestSpriteLines
   6.99%                        2220                       render_wall_1x1
   6.28%                        1995                       R_AddLine_loop
   5.09%                        1616                       R_AddLine_prelight
   4.53%   4.53%  20.81%        1438      1438      6610   R_FlushDeferredSurfaces
   3.57%   3.57%  29.98%        1133      1133      9523   R_SubSectorTryFlush
   3.20%                        1017                       R_AddSurface_loop
   3.18%   3.18%   3.18%        1010      1010      1010   add_wall_segment
   2.41%   2.39%   2.39%         764       759       759 * cache_resource


DSP side:

Code: Select all

Calls:
  11.43%                 877             lowerwall_skip
  10.08%  10.08%         774       774   R_DoColumnPerspCorrect
   8.32%                 639             upperwall_skip
   7.18%                 551             command_base
   4.68%                 359             midwall_skip
   3.44%                 264             R_CheckBBox_end
   3.44%   5.33%         264       409   R_CheckBBox
   3.44%                 264             oct_4
   3.37%                 259             project_node
   3.34%                 256             R_EndAddSurface
   3.32%                 255             R_SetSSectorLuma
   2.83%                 217             R_CheckBBox_pass
   2.76%   2.76%         212       212   divs_x1_a
   2.53%   2.53%         194       194   new_luminance
   2.38%                 183             trans_skip
   1.72%                 132             R_CheckBBoxPair
...
Used cycles:
  35.12%  35.12%  35.12%     2547318   2547318   2547318   R_DoColumnPerspCorrect
  16.58%                     1202712                       command_base
  12.78%                      926760                       R_VPRenderSky
   7.65%                      555242                       VPRenderPlaneDT_
   5.40%                      391374                       R_VPLoadTexture
   5.11%                      370884                       R_ViewTestAddLine
   3.45%   3.45%   3.45%      250076    250076    250076   extract_subvisplane
   2.15%                      156034                       project_node
   2.05%   2.53%  11.02%      148750    183806    799228   AddLowerWall
   1.30%                       94170                       R_CheckBBoxPair
   1.16%   1.52%   4.03%       84494    110158    292004   AddUpperWall
   1.01%   1.24%  25.73%       73286     89676   1866680   AddMidWall
   0.90%   0.90%   0.90%       65622     65622     65622   R_BufferSurface
   0.89%   1.28%   1.32%       64306     92870     95964   init_addwall


Significantly less DSP free than with Doom1 during the whole frame.

Thinking, CPU side:

Code: Select all

Time spent in profile = 0.17653s.
...
Calls:
  16.45%  60.54%        1126      4145   _P_MobjThinker
   8.69%   8.69%         595       595   _PIT_CheckThing
   7.13%                 488             copy16_d
   5.37%  14.06%         368       963   _P_BlockThingsIterator
   3.94%   6.10%         270       418   _PIT_CheckLine
   2.92%   3.80%         200       260   BM_P_CrossSubsector
   2.37%   2.37%         162       162   _R_PointInSubsector
   1.88%  28.41%         129      1945   _P_TryMove
   1.88%  25.08%         129      1717   _P_CheckPosition
   1.87%   1.87%         128       128   _memmove
   1.87%   1.87%         128       128   _SwapLONG
...
Executed instructions:
  15.37%  15.38%  15.38%       43414     43460     43460   _R_PointInSubsector
  11.04%  11.29%  21.27%       31196     31889     60109   _BM_P_CrossBSPNode
   9.62%   9.61%  78.28%       27186     27155    221179 * _P_MobjThinker
   9.16%   9.16%   9.99%       25881     25881     28220   BM_P_CrossSubsector
   5.53%   5.55%  33.12%       15636     15691     93579   _P_CheckPosition
   5.46%   5.50%  84.17%       15435     15529    237805   _P_RunThinkers
   4.93%   4.94%   4.94%       13932     13963     13963   _PIT_CheckThing
   4.67%   4.67%   9.61%       13182     13189     27152   _P_BlockThingsIterator
   3.78%   3.79%   5.06%       10687     10698     14283   _V_DrawPatch
   2.86%   2.86%   2.86%        8080      8084      8084   _P_UpdateSpecials
   2.39%   2.39%   5.56%        6752      6759     15701   _P_BlockLinesIterator
   1.80%   1.81%  23.09%        5092      5117     65226   _BM_P_CheckSight
   1.68%   1.69%   3.16%        4760      4764      8942   _PIT_CheckLine
   1.47%   1.47%  20.75%        4154      4161     58622   _P_LookForPlayers
   1.39%   1.39%  34.11%        3916      3916     96371   _P_Move
   1.30%   1.31%  38.42%        3687      3714    108544   _P_TryMove
   1.12%   1.12%  62.32%        3154      3161    176080   _P_SetMobjState
...
Instruction cache misses:
  10.45%  10.44%  77.70%        8654      8644     64326 * _P_MobjThinker
  10.08%  10.13%  88.22%        8345      8387     73036   _P_RunThinkers
   9.20%   9.23%  36.05%        7618      7644     29843   _P_CheckPosition
   6.52%   6.55%   6.55%        5398      5419      5419   _R_PointInSubsector
   5.09%   5.10%  12.12%        4215      4218     10037   _P_BlockLinesIterator
   4.79%   4.81%   4.81%        3963      3980      3980   _PIT_CheckThing
   4.64%   4.64%   9.45%        3841      3844      7824   _P_BlockThingsIterator
   4.35%   4.35%   7.03%        3601      3604      5819   _PIT_CheckLine
   3.33%   3.34%  11.32%        2757      2769      9374   _BM_P_CheckSight
   3.20%   3.69%   7.98%        2652      3055      6605   _BM_P_CrossBSPNode
   3.09%   3.11%  42.64%        2560      2574     35295   _P_TryMove
   2.93%   2.93%   4.29%        2428      2428      3550   BM_P_CrossSubsector
   2.34%   2.34%  58.96%        1936      1939     48813   _P_SetMobjState
   1.99%   1.99%  37.65%        1646      1646     31165   _P_Move
   1.65%   1.65%  11.31%        1364      1367      9362   _P_LookForPlayers
   1.44%   1.44%   2.40%        1194      1194      1985   _P_BoxOnLineSide
   1.42%   1.43%   2.34%        1175      1181      1939   _V_DrawPatch
   1.35%   1.36%   1.36%        1116      1122      1122   BM_P_CrossSubsector_vdelta
   1.35%   1.35%  32.22%        1114      1114     26676   _P_NewChaseDir


DSP side:

Code: Select all

Calls:
  96.26%   1.64%       64280      1093   P_CrossSubsector
   1.16%   1.16%         775       775   TestLineSegVectorBisection
   0.90%                 604             P_CrossSubsector_doseg
   0.69%   1.36%         459       909   InterceptVectors
   0.67%   0.67%         450       450   Divs48_Real

Used cycles:
  79.03%                     4476186                       command_base
   8.97%                      508272                       ALGO_P_CrossBSPNode
   7.00%   9.41%  10.29%      396552    532708    582936   P_CrossSubsector
   2.38%                      134836                       P_CrossSubsector_doseg
   1.54%   1.54%   1.54%       87394     87394     87394   Divs48_Real
   0.63%   0.63%   0.63%       35650     35650     35650   TestLineSegVectorBisection


=> In Doom1 code, thinking takes slightly more CPU than rendering, in Doom2 rendering takes more time.

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 11:01 am

dml wrote:So this is most probably the title screen being drawn on each refresh. It is quite slow.


I tested trivial optimization of only drawing page lumps in d_main.c just once, and it seems to work fine.

However, Doom apparently does some wait before starting the timedemo, so although v_DrawPatch disappeared, the startup situation didn't much change:

Code: Select all

Time spent in profile = 77.84691s.

Calls:
- max = 777056, in strcmp_8 at 0x46710, on line 11741
- 2806417 in total
Executed instructions:
- max = 1966080, in D_TextureRemapInit+14 at 0x48cf4, on line 13125
- 141759469 in total
Used cycles:
- max = 27189548, in _BM_T_GetTicks+6 at 0x440a2, on line 10334
- 1248858632 in total
Instruction cache misses:
- max = 1356960, in strcmp_8 at 0x46710, on line 11742
- 20701302 in total

Calls:
  27.68%  27.68%      776903    776903   strcmp_8
  16.13%  16.13%      452620    452620   _BM_T_GetTicks
  16.12%  32.23%      452358    904629   _I_GetTime
...
Executed instructions:
  16.39%  16.41%  16.41%    23236313  23258223  23258223   flat_generate_mips_8bitsrc
  16.23%  16.24%  16.24%    23011853  23025414  23025414   strcmp_8
  13.26%  10.71%  10.71%    18797358  15178405  15178405 * D_PatchRender_8_8
   5.44%   5.45%  65.08%     7709387   7731831  92258680   _TryRunTics
   5.27%   5.27%  19.50%     7467266   7472500  27648066   hrsc_locate_q
   4.16%   4.17%   4.17%     5898360   5904745   5904745   D_TextureRemapInit
   3.92%   3.93%   4.86%     5553897   5577824   6886665   _BM_S_Init
   2.91%   2.91%   7.08%     4124714   4130946  10036141   flat_remap_mips
   2.55%   2.56%   4.16%     3621658   3627199   5895887   _I_GetTime
...
Instruction cache misses:
  21.92%  21.97%  67.95%     4537883   4547731  14066491   _TryRunTics
  13.47%  13.49%  13.49%     2788927   2793569   2793569   strcmp_8
  12.04%  12.05%  14.27%     2492702   2494135   2953047   _I_GetTime
   9.99%  10.00%  23.55%     2068758   2070361   4874296   hrsc_locate_q
   7.75%   7.77%  13.34%     1604871   1608174   2761865   _GetPackets
   6.71%   6.75%  27.01%     1389157   1397465   5592050   _NetUpdate
   5.50%   5.51%   5.55%     1137739   1140029   1148150   _HGetPacket


Doom1 startup looks then:

Code: Select all

Executed instructions:
  26.20%  26.23%  26.23%    22462729  22483931  22483931   flat_generate_mips_8bitsrc
  10.81%   4.38%   4.38%     9270334   3757356   3757356 * D_PatchRender_8_8
   6.65%   6.66%   6.66%     5701748   5707887   5707887   D_TextureRemapInit
   6.48%   6.51%   8.03%     5553897   5577824   6886665   _BM_S_Init
   5.53%   5.54%   5.54%     4744045   4746646   4746646   strcmp_8


IMHO timedemo could start immediately. When still developing Doom, there's no benefit from looking at the titlescreen longer at startup... Could you remove the timeout? :-)


dml wrote:
Eero Tamminen wrote:Do the strcmp_8 lookups have the same optimization I did for C-side code?
(You might add some note about these to TODO list.)


This I can't remember - there are some string8 copying optimizations in there but I don't know if the string compare is one of them. I'll take a look. I'll update the todo accordingly.


The optimization was upcasing all names on load and upcasing the compared string on call of the function, so that <8 char strings are padded with zeros, and then doing the comparison (inline) using 2 longs. Even faster would be to qsort() names & bsearch() them, but index for that would take a bit more memory, and it was fast enough without.

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sat Jun 15, 2013 11:33 am

When Doom (v1) runs timedemo with menu being shown on top, DrawPatch+LumpName stuff come up in charts:

Code: Select all

Calls:
   9.31%  28.45%      114380    349621   _P_MobjThinker
   7.54%               92707             copy16_d
   7.35%   7.35%       90318     90318   _SwapLONG
...
Executed instructions:
  29.01%                    22261231                       render_wall_1x1
  11.57%  11.58%  13.92%     8876847   8888934  10684802   _V_DrawPatch
   8.21%   8.22%   8.38%     6302912   6309790   6434303   _W_CacheLumpName
   5.37%                     4123926                       render_transparent_1x1
   3.95%                     3034800                       R_VisPlaneShader
   3.70%   3.70%  22.18%     2836057   2839471  17024979   _P_MobjThinker
   3.36%   3.44%   5.86%     2576037   2637526   4497069   _BM_P_CrossBSPNode
   3.27%   3.28%   3.28%     2513432   2515799   2515799   _R_PointInSubsector
   2.24%   2.24%   2.42%     1717638   1717680   1859956   BM_P_CrossSubsector
   2.12%   2.12%   2.12%     1626357   1627417   1627417   _SwapLONG


All SwapLONG calls and most of SwapShort calls come from V_DrawPatch, see the attached callgraph. copy16_d is part of memmove() which calls are 1/3 from V_CopyRect.

V_DrawPatch() calls SwapSHORT from several places with JSR, here's where it BSRs to SwapLONG and part of it taking most of CPU:

Code: Select all

$03df58 :             lea       8(a6),a0                   0.12% (90354, 376276, 3701)
$03df5c :             move.l    (a0,d2.l*4),-(sp)          0.12% (90354, 2184348, 3537)
$03df60 :             bsr.l     _SwapLONG                  0.12% (90354, 1086272, 0)
$03df66 :             lea       (a6,d0.l),a0               0.12% (90353, 724100, 0)
...
$03df9e :             clr.l     d1                         0.12% (95443, 381936, 2851)
$03dfa0 :             move.b    (a2)+,d1                   1.72% (1316978, 10546900, 154)
$03dfa2 :             move.w    (a4,d1.l*2),(a1)           1.72% (1316978, 21087356, 0)
$03dfa6 :             lea       $280(a1),a1                1.72% (1316978, 5269600, 0)
$03dfaa :             dbra      d0,$3dfa0                  1.72% (1316978, 10940264, 0)
$03dfae :             clr.w     d0                         0.12% (95443, 381824, 3002)
You do not have the required permissions to view the files attached to this post.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 12:09 pm

Eero Tamminen wrote:Here's single frame rendering from Doom1, when the player goes down the stairs towards a door with transparent monster.


Thanks. There's a *lot* of useful new info here - some important recent changes such as these (below) haven't really shown up until now and I expected to see them sooner. The cache miss rates suggest I should do some more work there.

20.42% 20.42% 20.55% 1738 1738 1749 R_AddSpriteSpans
14.93% 14.93% 14.93% 1271 1271 1271 R_ViewTestSpriteLines

Eero Tamminen wrote:Note: post-processor shows init_font() calls because you've remove framecounter() symbol from doom.sym. In reality they're framecounter() calls.


Disabling the text console stuff has also broken the window resolution indicator which prints xres*yres when you resize the window - it's not controlled by the build flag so it shows a garbage rectangle now. It took me a while to figure out what that was. At first I thought it was Doom drawing a strange icon on resize :)

Eero Tamminen wrote: 28.27% 1334 lowerwall_skip
22.17% 22.17% 1046 1046 R_DoColumnPerspCorrect
10.72% 506 trans_skip
7.20% 340 midwall_skip
6.42% 303 upperwall_skip


I'll need to think about that one since it's unusual to see. It's probably a lot of upper/lower walls which technically pass the wall occlusion test but are 90% below/behind something else and those columns get skipped by per-column occlusion tests during the edge stepping process.

Eero Tamminen wrote: 59.35% 59.35% 59.35% 3851632 3851632 3851632 R_DoColumnPerspCorrect
23.52% 1526070 command_base
[/code]


Sprites still use R_DoColumnPerspCorrect and don't occlude anything so they can really build up the cost of this function with many overlaps. I'm planning an alternate path for sprites with constant z and very little work except column top/bottom trimming.

Eero Tamminen wrote:Here's then the part between render_end and render_begin i.e. "thinking", for next frame...

CPU side:

Code: Select all

Time spent in profile = 0.21728s.
...
Calls:
  14.97%  66.60%        1272      5658   _P_MobjThinker
  10.50%  10.50%         892       892   _P_PointOnDivlineSide
   6.75%   7.27%         573       618   _PIT_CheckLine
   4.14%  14.38%         352      1222   _PIT_AddLineIntercepts
   3.77%   3.77%         320       320   _PIT_CheckThing
   3.67%                 312             copy16_d
   3.38%  10.79%         287       917   _P_BlockThingsIterator
   2.60%   3.15%         221       268   BM_P_CrossSubsector
   2.41%                 205             copy16



P_PointOnDivlineSide is a definite DSP candidate. PIT_CheckLine, PIT_AddLineIntercepts, PIT_CheckThing I think are the collision raytest/clipping and probably also DSP candidates in there.

copy16* I *suspect* is from memcpy calls made from the V_CopyRect dirty rectangle updates for the status bar. That's a guess though.

Eero Tamminen wrote:...
Executed instructions:
12.08% 12.38% 20.44% 40126 41135 67932 _BM_P_CrossBSPNode
9.52% 9.53% 80.59% 31620 31674 267794 _P_MobjThinker
8.02% 8.04% 8.04% 26664 26705 26705 _P_PointOnDivlineSide
7.51% 7.51% 8.06% 24947 24969 26797 BM_P_CrossSubsector
6.82% 6.82% 6.82% 22655 22669 22669 _R_PointInSubsector


Ouch. More still to do here it seems!

Eero Tamminen wrote: 10.98% 11.00% 33.89% 12793 12822 39499 _P_BlockLinesIterator


Not sure what this is yet. Something to do with entities and the blockmap. There is some kind of AI interaction with blockmap (grid of cells indicating overlap with some other stuff - sectors or walls IIRC. I don't use it in BM engine).

Eero Tamminen wrote:DSP side:[code]
Calls:
96.06% 1.67% 63231 1102 P_CrossSubsector


I was going to say 'wow' but P_CrossSubsector is inlined as part of another loop and these are relative measurements for label visits. Still, it's a lot.

Eero Tamminen wrote: 1.20% 1.20% 787 787 TestLineSegVectorBisection
0.86% 567 P_CrossSubsector_doseg
0.74% 1.46% 484 964 InterceptVectors
0.73% 0.73% 480 480 Divs48_Real


Thankfully these ^^^ are being shortcut most of the time, because they are expensive. Divs48_Real is my last-resort floating point divide :) for when no other divide is enough.

Eero Tamminen wrote:DSP side:[code]
Calls:
11.43% 877 lowerwall_skip
10.08% 10.08% 774 774 R_DoColumnPerspCorrect
8.32% 639 upperwall_skip
7.18% 551 command_base
4.68% 359 midwall_skip
3.44% 264 R_CheckBBox_end
3.44% 5.33% 264 409 R_CheckBBox



Again, interesting to see so much skipping. Walls passing occlusion test, but most of the columns being 0 or less high and getting skipped.

Eero Tamminen wrote:Significantly less DSP free than with Doom1 during the whole frame.
=> In Doom1 code, thinking takes slightly more CPU than rendering, in Doom2 rendering takes more time.


I'm going to speculate from the above that Doom2's extensive use of open areas (many sector joins with scene depth) and floor height changes is causing the occlusion testing to work ineffectively, and lots of wall scanning is going on between the player's eye and the first solid surface, without much drawing resulting in between.

This could probably be solved with a different kind of occlusion testing but I'd have to think quite hard about that before mucking about with that now.

Thanks for the detailed reports I'll be digesting it all :-)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 12:13 pm

Eero Tamminen wrote:I tested trivial optimization of only drawing page lumps in d_main.c just once, and it seems to work fine.

However, Doom apparently does some wait before starting the timedemo, so although v_DrawPatch disappeared, the startup situation didn't much change:


Indeed - the title screen and attract stuff are based on seconds elapsed, not work done - so removing/optimizing the title screen won't have any effect. There's a delay counter in the game code which defines the timeout, I'll locate it and put a build switch around it so it can be skipped.

BTW what Doom command are you using to invoke the timedemo? The command '-timedemo demo1' should skip the title screen and will replay every frame of the demo 1:1 for tick : render (so it will run in slow motion but with no time-adaptive behaviour, so the time to complete the demo will be pure 'work done' for the whole demo).

Eero Tamminen wrote:The optimization was upcasing all names on load and upcasing the compared string on call of the function, so that <8 char strings are padded with zeros, and then doing the comparison (inline) using 2 longs. Even faster would be to qsort() names & bsearch() them, but index for that would take a bit more memory, and it was fast enough without.


Most of the (subsequent) BM string lookups go through a hashtable but the hashtable still uses a strcmp equivalent to check the relevant hash tags. This might be what's showing up. I'll check it later today.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 12:32 pm

dml wrote:
Eero Tamminen wrote: 28.27% 1334 lowerwall_skip
22.17% 22.17% 1046 1046 R_DoColumnPerspCorrect
10.72% 506 trans_skip
7.20% 340 midwall_skip
6.42% 303 upperwall_skip


I'll need to think about that one since it's unusual to see. It's probably a lot of upper/lower walls which technically pass the wall occlusion test but are 90% below/behind something else and those columns get skipped by per-column occlusion tests during the edge stepping process.


Actually I can explain this one - I keep forgetting 'Calls' in the profiler summary are not actually calls (they don't return) they are visits or encounters. So these 'skip' labels are still visited even for columns which are not skipped. So its just indicating a lot of lower columns get processed generally, which is true for open plan maps with sectors steps/height differences all over the place.

Perhaps it would be better to call that list 'visits' and keep 'calls' for the analysed callgraph output? i.e. labels which are called and return?

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 12:36 pm

Eero Tamminen wrote:All SwapLONG calls and most of SwapShort calls come from V_DrawPatch, see the attached callgraph. copy16_d is part of memmove() which calls are 1/3 from V_CopyRect.


Ok I thought as much (re: memcpy ops from V_CopyRect) although I wasn't sure about SwapLONG etc. I suppose it's reading dimensions from the Intel-formatted patches loaded into memory (they have width/height/offset headers in little-endian format).

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 12:43 pm

So apart from the profiling activity and ponderings over performance hotspots, the good news is the texture mapping column calc changes worked - almost. Not quite done yet because I forgot a step for aspect ratio correction but it basically works otherwise. Precision is slightly improved when the viewer is very close to a surface. The depth cue lighting is now also a bit broken but that should be easy to fix.

[EDIT] ...now fixed

I'll post separately on the math steps used since there are a couple of neat tricks in there now which could be useful in other projects. :-)
Last edited by dml on Sat Jun 15, 2013 10:06 pm, edited 2 times in total.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 9:36 pm

Here are some texture mapping tricks I use in BM. This mainly helps eliminate divides in the polygon/edge setup part, without using tables or changing pixel/texel stepping part. I haven't seen this before, but it's not terribly complicated so I'm not claiming it's in any way new. But it's interesting in any case.

Whether this is useful or not depends on how many edges you need to set up and what you're doing with attributes but it can be a good saving if divide is expensive, mul is cheap and/or you don't have space for big tables (e.g DSP).

The 'classic' way to do perspective correction is as follows:

Code: Select all

; set up screen-z interpolant for a range or span of pixels
t1 = 1/z1;
t2 = 1/z2;

; set up screen-u interpolant
uz1 = u1*t1;
uz2 = u2*t2;

; interpolate screen-z, screen-u for pixel 'i'
ti = linear_interpolate(t1,t2,i);
uzi = linear_interpolate(uz1,uz2,i);

; find world-u at pixel 'i'
u = uzi / ti;


The 'trick' I brewed for BM takes advantage of a relationship between the true perspective curve generated across the range for (uz/t), the curve generated by the inverse range (uz*t) and the z interpolant direction. In other words, by flipping the z interpolant range, the range being interpolated can be u*z instead of u/z.

So the modified code looks like this:

Code: Select all

; set up screen-z interpolant
; USE *INVERTED Z RANGE*, and reciprocal of [1/z] i.e. just [z]
; this saves two divides per Z range, or one per Z term
t1 = z2;
t2 = z1;

; set up screen-u interpolant
uz1 = u1*t1;
uz2 = u2*t2;

; interpolate screen-z, screen-u
ti = lerp(t1,t2,i);
uzi = lerp(uz1,uz2,i);

; find world-u
u = uzi / ti;


...the problem with this is that you can't easily access the true z at each pixel, which might be useful for depth cue, a linear (?) zbuffer or something else. However you can easily recover it as follows:

Code: Select all

; set up screen-z interpolant
; USE *INVERTED RANGE* -, reciprocal of [1/z] i.e. just [z]
t1 = z2;
t2 = z1;
; interpolating to retrieve true z: lerp(z1*z2, z2*z1, i) is a constant! so we don't have to interpolate.
zc = z1*z2;

; set up screen-u interpolant
uz1 = u1*t1;
uz2 = u2*t2;

; interpolate screen-z, screen-u
ti = lerp(t1,t2,i);
uzi = lerp(uz1,uz2,i);

; rcp
rt = 1/ti;

; find world-u
u = uzi * rt;
; find true z
z = zc * rt;


One more trick which can be used with the above - you can recover an approximate z even without a multiply, just by flipping [ti] with respect to the original z1,z2 range. The downside is, it's not perspective correct. It can do for depth cue lighting though or low precision zbuffer.

e.g.

Code: Select all

approxz = -ti + z1 + z2;


There are a few other unusual things in the BM renderer. When I dig more of it out I'll post.

[EDIT]

One other TM trick worth noting from the older BM code (but now excised, due to simplification efforts) is the same range-inversion approach combined with extraction of the perspective curve itself into a distortion curve, which can be applied cheaply to any attribute while stepping. This means you don't have to do any edge setup for the attributes (good if you have many of them e.g. u,v,colour.. etc) as they can be combined with the curve term using just a multiply.

The problem with it IIRC is the fact it degenerates when z1=z2 (view-perpendicular surfaces) and I hid this problem previously by using a linear texture mapping path for those walls. This was fine because it also speeds up those walls. However the whole thing was quite complicated and used more code space, and I don't like special cases in code, so it got chucked away in favour of the simpler method (which ends up faster because there is only a single u term and depth cue which is easy to generate several different ways).

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sat Jun 15, 2013 10:22 pm

Have just committed some changes which repair and enable diffuse prelighting on walls (to help distinguish walls at different angles by shading them according to some notional light direction).

The effect is deliberately muted at a few % so it doesn't detract from sector prelighting made by level artists but it's enough to highlight more of the geometry that's there already and sometimes not very obvious.

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 871
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: Bad Mood : Falcon030 'Doom'

Postby EvilFranky » Sun Jun 16, 2013 6:57 am

Is that a bit of a Falcon 'bonus' Doug? :)

Sounds like something that can only be implemented properly using a hi-colour mode?

Any pre and post screenshot examples? :mrgreen:

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sun Jun 16, 2013 9:04 am

EvilFranky wrote:Is that a bit of a Falcon 'bonus' Doug? :)


An early Falcon bonus yes.

EvilFranky wrote:Sounds like something that can only be implemented properly using a hi-colour mode?
Any pre and post screenshot examples? :mrgreen:


There are some older screenshots waaaay back in this thread but I'll post new ones later today which will be easier to find :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sun Jun 16, 2013 1:02 pm

Today I've been playing with an 'invisibility shader' which produces a predator-like effect on those spectre monsters (the kind of big pink things which are sometimes invisible). On the PC they use a sort of shadow mask which darkens the background and adds animated speckles.

I'm trying something else which offsets the existing framebuffer using the sprite colour index as part of the offset pattern.

It's beginning to work quite well for things that move, but taking a screenshot is useless because the object is impossible to find when it stops moving :-) It will take some time and experiments to get the best effect from the colour index -> offset mappings and to deal with errors at the framebuffer edges.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Postby dml » Sun Jun 16, 2013 1:54 pm

Some early tests with the 'invisible' shader on all sprites.

inv1.jpg

inv2.jpg

inv3.jpg


There are several ways to do this kind of thing, some being more 'correct' than others but the trick here is to make it simple and fast enough to fill screen area without killing a base Falcon. So far its looking like it will be practical.
You do not have the required permissions to view the files attached to this post.

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 871
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: Bad Mood : Falcon030 'Doom'

Postby EvilFranky » Sun Jun 16, 2013 1:57 pm

Fantastic! Another Falcon bonus, well done Doug :cheers:

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2025
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Postby Eero Tamminen » Sun Jun 16, 2013 1:57 pm

dml wrote:Today I've been playing with an 'invisibility shader' which produces a predator-like effect on those spectre monsters (the kind of big pink things which are sometimes invisible). On the PC they use a sort of shadow mask which darkens the background and adds animated speckles.

I'm trying something else which offsets the existing framebuffer using the sprite colour index as part of the offset pattern.

It's beginning to work quite well for things that move, but taking a screenshot is useless because the object is impossible to find when it stops moving :-) It will take some time and experiments to get the best effect from the colour index -> offset mappings and to deal with errors at the framebuffer edges.


If you don't want to change pixel colors, e.g. to:
  • quantize/posterize (color+1 >> 1 << 1), or
  • darken/brighten/dither
the colors behind it...

You could consider using the sprite center offset for the bg color offset. This way the creature outline would be more visible, hopefully also when it's not moving.

As enemy sprites are of known size, one doesn't need integer sqrt, offset might just use fixed factor with something like (dx+dy) >> 4, maybe capped to some max value for corners of the sprite.

Some pixel sparkles might be good too ("if (ubyte_sparkler += 4 < 4) add_sparkle()").


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 6 guests