Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dma
Atari God
Atari God
Posts: 1223
Joined: Wed Nov 20, 2002 11:22 pm
Location: France

Re: Bad Mood : Falcon030 'Doom'

Post by dma »

Reading about your optimisation researches, it sounds like DOOM will be able to run on ST in the end. :lol: (joking eh)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Don't catch exact point when the player dies / timedemo ends, like they do with Doom II timedemo. There's some credits screen showed for a while before profiling ends. Could you suggest a symbol which isn't called, or doesn't change during normal gameplay, but does when timedemo ends?
Hi, sorry I forgot about this. The demo/game state management area is quite hard to follow because it runs as a sort of service. There is a symbol D_AdvanceDemo which IIRC is called every time the attract mode switches state e.g. from one title/splash screen to another, or in and out of demo mode. It won't happen during the demo, but will at the beginning and end (e.g. before loading, after death).
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:The demo/game state management area is quite hard to follow because it runs as a sort of service. There is a symbol D_AdvanceDemo which IIRC is called every time the attract mode switches state e.g. from one title/splash screen to another, or in and out of demo mode. It won't happen during the demo, but will at the beginning and end (e.g. before loading, after death).
Thanks, that seems to work fine!

With Doom I, the first startup seems to be taking ~2.5 min when it generates/caches the data. With real Atari hard disk that's probably a bit longer. There are no resource loads during normal game play.

As to latest worst frame profiles for Doom I, here's the thinking part (gcc 2.x, TIMERBASE=1)...

CPU side:

Code: Select all

Time spent in profile = 0.23188s.

Visits/calls:
- max = 1057, in _PIT_CheckLine at 0x32564, on line 2281
- 6332 in total
Executed instructions:
- max = 4333, in _R_DrawColumn+90 at 0x3d51c, on line 3329
- 379634 in total
Used cycles:
- max = 63476, in _PIT_CheckLine+780 at 0x32870, on line 2429
- 3719900 in total
Instruction cache misses:
- max = 3072, in _PIT_CheckLine+778 at 0x3286e, on line 2428
- 91763 in total
...
Executed instructions:
  13.85%  13.88%  35.48%       52567     52682    134712   _P_CheckPosition
  12.37%  12.39%  12.39%       46959     47024     47024   _R_PointInSubsector
  11.48%  11.62%  14.51%       43564     44115     55070   _BM_P_CrossBSPNode
  10.43%  10.45%  76.12%       39612     39688    288978   _P_RunThinkers
   8.09%   8.10%   8.10%       30713     30733     30733   _BM_A_Mux3x2
   7.51%   7.52%   8.50%       28494     28559     32274   _R_DrawColumn
   4.79%   4.81%   6.11%       18187     18257     23198   _PIT_CheckLine
   4.40%   4.40%   4.55%       16689     16703     17255   _V_DrawPatch
   4.29%   4.30%   5.27%       16293     16333     19992   _PIT_CheckThing
   2.09%   2.10%  10.61%        7937      7955     40261   _R_DrawVisSprite
   1.94%   1.94%  17.48%        7370      7370     66362   _P_LookForPlayers
   1.66%   1.66%  16.17%        6296      6314     61384   _BM_P_CheckSight
   1.47%   1.48%   1.48%        5595      5622      5622   _P_UpdateSpecials
   1.38%   1.38%  38.83%        5230      5237    147427   _P_TryMove
   1.15%   1.15%  31.57%        4354      4354    119835   _P_Move
   1.05%   1.05%  52.69%        3972      3979    200044   _P_SetMobjState
   0.83%                        3139                       copy16_d
   0.76%   0.76%   3.89%        2895      2895     14766   _P_SetThingPosition
   0.59%   0.59%  33.47%        2223      2223    127071   _A_Chase
   0.55%   0.55%   0.55%        2091      2091      2091   _P_UnsetThingPosition
   0.54%   0.54%  25.63%        2037      2044     97292   _P_NewChaseDir
...
Instruction cache misses:
  19.17%  19.23%  41.08%       17594     17648     37696   _P_CheckPosition
  12.80%  12.83%  13.85%       11748     11776     12707   _PIT_CheckLine
   8.81%   8.85%  85.22%        8085      8122     78203   _P_RunThinkers
   5.66%   5.70%   5.70%        5198      5232      5232   _R_PointInSubsector
   5.51%   5.94%   6.36%        5052      5453      5840   _BM_P_CrossBSPNode
   4.31%   4.32%  10.68%        3953      3962      9802   _BM_P_CheckSight
   3.75%   3.75%  46.63%        3440      3443     42790   _P_TryMove
   3.38%   3.41%   3.57%        3101      3125      3276   _PIT_CheckThing
   2.70%   2.70%  60.14%        2474      2477     55183   _P_SetMobjState
   2.67%   2.68%   4.54%        2454      2463      4162   _R_DrawVisSprite
   2.62%   2.62%  38.50%        2403      2403     35329   _P_Move
   2.59%   2.59%  13.59%        2374      2374     12471   _P_LookForPlayers
   2.00%   2.00%   3.41%        1832      1832      3133   _P_SetThingPosition
   1.64%   1.68%   1.83%        1509      1540      1679   _R_DrawColumn
   1.56%   1.56%  42.28%        1431      1431     38793   _A_Chase
   1.35%   1.35%  31.62%        1240      1243     29014   _P_NewChaseDir
DSP side:

Code: Select all

Used cycles:
  84.77%                     6306918                       command_base
   7.23%                      537714                       ALGO_P_CrossBSPNode
   5.55%                      412866                       P_CrossSubsector_body
   1.05%   1.05%   1.05%       78400     78428     78428   InterceptVectorsUF
   0.84%                       62308                       Divs48_Real
...
Visits/calls:
  22.25%  22.25%         627       627   InterceptVectorsUF
  20.40%  20.40%         575       575   TestLineSegVectorBisection
  14.51%                 409             P_CrossSubsector_body
  12.46%                 351             VECOP_return
  12.21%  12.21%         344       344   InterceptVectors
  11.99%                 338             Divs48_Real
   3.09%                  87             ALGO_P_CrossBSPNode
   3.09%                  87             command_base
Rendering side...

CPU part:

Code: Select all

Time spent in profile = 0.20246s.

Visits/calls:
- max = 188, in R_AdvanceSurface_NMip1 at 0x53f58, on line 2149
- 2945 in total
Executed instructions:
- max = 4769, in R_VisPlaneShaderQuickMip+202 at 0x537b6, on line 1743
- 351945 in total
Used cycles:
- max = 74208, in R_VisPlaneShaderQuickMip+200 at 0x537b4, on line 1742
- 3247900 in total
Instruction cache misses:
- max = 646, in R_AddSpriteSpans+48 at 0x52c18, on line 1069
- 27882 in total
...
Executed instructions:
  18.39%                       64717                       R_VisPlaneShaderQuickMip
  17.38%                       61151                       R_SpriteColumnShader_Masked2
  12.18%                       42858                       R_DrawTSurface_Masked1
   6.31%                       22195                       R_AdvanceSurface_NMip1
   5.52%   5.53%   5.53%       19416     19472     19472   stream_texture
   4.79%                       16841                       R_AdvanceSurface_NMip2
   4.45%   4.45%   4.45%       15657     15675     15675   R_ViewTestSpriteLines
   3.32%                       11687                       R_StackTransparentSurface
   3.24%                       11405                       R_BSPHyperPlane
   3.14%   3.15%   3.30%       11058     11092     11617   R_AddSpriteSpans
   2.32%   2.33%   2.33%        8170      8190      8190   _BM_A_Mux1x2
   2.07%   2.10%   2.42%        7269      7392      8510   stack_visplane_area
   1.99%                        7002                       R_AdvanceSurface_NMip3
   1.81%                        6353                       R_AddLine_loop
   1.40%                        4928                       R_AdvanceSurface_TMip2
   1.06%   1.06%   1.06%        3741      3745      3745   add_wall_segment
   0.99%   0.99%   0.99%        3478      3478      3478   R_SetSubSectorLuma
   0.91%                        3198                       build_ssector
   0.87%   0.88%   0.88%        3079      3099      3099   init_stategroups
   0.63%   0.63%   2.83%        2230      2230      9952   get_ssector
   0.59%   0.59%  21.69%        2067      2094     76330   R_FlushDeferredSurfaces
   0.58%                        2026                       R_DrawSurface_NMip
...
Used cycles:
  22.89%                      743488                       R_VisPlaneShaderQuickMip
  12.70%                      412512                       R_SpriteColumnShader_Masked2
   9.07%   9.11%   9.11%      294652    295724    295724   stream_texture
   7.78%                      252612                       R_DrawTSurface_Masked1
   4.94%                      160588                       R_AdvanceSurface_NMip1
   4.84%   4.85%   4.85%      157072    157456    157456   R_ViewTestSpriteLines
   3.76%                      122264                       R_AdvanceSurface_NMip2
   3.62%   3.64%   3.85%      117520    118112    124936   R_AddSpriteSpans
   3.52%                      114196                       R_StackTransparentSurface
   3.46%                      112384                       R_BSPHyperPlane
   2.71%   2.72%   2.72%       87888     88192     88192   _BM_A_Mux1x2
...
Instruction cache misses:
  21.14%  21.20%  22.14%        5893      5910      6173   R_AddSpriteSpans
   6.71%   6.74%   6.74%        1870      1879      1879   R_ViewTestSpriteLines
   5.26%                        1467                       build_ssector
   4.81%                        1342                       R_AddLine_loop
   4.17%   4.22%  21.30%        1163      1177      5938   R_FlushDeferredSurfaces
   3.29%                         916                       R_AddOverlappingSprites
   3.17%   3.17%  29.20%         883       883      8141   R_SubSectorTryFlush
DSP side:

Code: Select all

Used cycles:
  34.84%                     2263440                       command_base
  21.71%  21.71%  21.71%     1410012   1410012   1410012   VPRenderSpanQuickMip
   9.03%                      586500                       R_VPLoadTexture
   7.57%   9.40%   9.40%      491538    610408    610408   R_DoColumnPerspCorrect
   5.50%   5.50%   5.50%      357400    357400    357400   extract_subvisplane
   3.23%                      209658                       R_DoColumnTextureUV
   3.19%                      207464                       R_ViewTestAddLine
   1.89%                      123080                       R_VPRenderPlane
   1.78%                      115800                       project_node
   1.07%   1.14%   3.92%       69232     73784    254728   AddTransWall
   1.06%   1.08%   4.10%       69104     70218    266220   AddUpperWall
   1.03%                       67080                       R_CheckBBoxPair
   1.03%   1.06%   2.76%       66964     68926    179106   AddLowerWall
   0.99%   1.02%   7.01%       64062     66386    455574   AddMidWall
   0.99%   2.27%   2.27%       64056    147554    147554   R_DoColumnConstantClone
   0.76%   0.76%   0.76%       49500     49500     49500   R_BufferSurface
   0.68%                       44090                       R_ViewTestSpriteLine
QuickMip stuff is high both on CPU & DSP side. Flushing with stuff it calls seems to causing i-cache misses.

If you have about finished the optimizations for the timedemo compatible parts of thinking phase, I could switch the worst frame profiling to TIMERBASE=3 again.


Btw. I get one error on console when BM starts:

Code: Select all

InitTextureserror: could not map flat [F1_START] via BM API
Is this something known?

With Doom II, BM tries access following nonexisting directory: BMC/SPR/VILE. The BMC/FLT subdirectory created under cache directory, is empty even after running both Doom I & Doom II.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Here's also profile for full Doom I timedemo. Unlike earlier profiles, it doesn't start from first P_LineAttack call, but earlier, from first A_Chase call.

CPU side:

Code: Select all

Time spent in profile = 111.73971s.

Visits/calls:
- max = 175998, in copy16_d at 0x56e10, on line 14504
- 2174274 in total
Executed instructions:
- max = 1025620, in R_VisPlaneShaderQuickMip+202 at 0x537b6, on line 12643
- 195219745 in total
Used cycles:
- max = 16201124, in R_VisPlaneShaderQuickMip+200 at 0x537b4, on line 12642
- 1792583640 in total
Instruction cache misses:
- max = 441287, in _P_RunThinkers+268 at 0x3a96a, on line 8172
- 24581582 in total
...
Executed instructions:
   8.88%                    17333744                       R_AdvanceSurface_TMip0
   8.06%   8.16%   8.70%    15740734  15925570  16978783   _BM_P_CrossBSPNode
   7.05%                    13762875                       R_VisPlaneShaderQuickMip
   6.74%   6.75%  30.01%    13162760  13184062  58578535   _P_RunThinkers
   6.65%                    12989756                       R_AdvanceSurface_NMip0
   5.14%                    10039907                       R_SpriteColumnShader_Masked2
   3.77%                     7367649                       R_AdvanceSurface_NMip1
   3.73%                     7285190                       R_AdvanceSurface_TMip1
   3.15%   3.16%   3.29%     6152144   6163795   6420717   _R_DrawColumn
   3.11%   3.11%   3.11%     6080012   6079945   6079945 * _BM_A_Mux3x2
   3.04%   3.04%   3.17%     5927874   5933514   6197347   _R_PointInSubsector
   2.99%   3.00%   3.25%     5842231   5851897   6342972   _V_DrawPatch
   2.93%   2.94%   6.89%     5720589   5733546  13456957   _P_CheckPosition
   2.28%                     4441648                       R_DrawTSurface_Masked1
   1.98%   1.99%   2.17%     3863784   3877200   4232881   stream_texture
   1.40%                     2725760                       R_AdvanceSurface_NMip2
   1.26%   1.27%  10.94%     2469461   2473636  21365567   _P_LookForPlayers
   1.20%   1.21%   4.57%     2346797   2353235   8930069   _R_DrawVisSprite
   1.13%   1.13%   1.13%     2197346   2198606   2198606   _BM_A_Mux2x2
   1.10%   1.10%   9.90%     2145626   2155984  19327954   _BM_P_CheckSight
   1.08%                     2110282                       R_BSPHyperPlane
   1.00%                     1957029                       R_StackTransparentSurface
   0.95%   0.95%   1.02%     1856451   1861708   1999475   _P_UpdateSpecials
   0.91%   0.91%   0.97%     1767765   1771830   1886364   R_ViewTestSpriteLines
   0.87%   0.88%   1.03%     1702978   1709257   2013488   _PIT_CheckLine
   0.85%                     1659082                       R_AdvanceSurface_TMip2
   0.83%   0.83%   0.83%     1610563   1611663   1611663   _BM_A_Mux1x2
   0.74%                     1442033                       R_AddLine_loop
   0.71%   0.71%   0.76%     1391610   1395026   1484345   _PIT_CheckThing
   0.70%   0.70%   0.78%     1365293   1368782   1525302   R_AddSpriteSpans
   0.68%   0.68%   0.71%     1323970   1326014   1385895   init_stategroups
   0.67%                     1300430                       R_VisPlaneShaderWarp
   0.61%   0.61%  19.64%     1181773   1184537  38349732   _P_SetMobjState
   0.51%   0.51%   0.54%      993652    995646   1057567   R_SetSubSectorLuma
   0.51%                      986931                       copy16_d
   0.50%   0.51%   0.55%      979733    997206   1074135   stack_visplane_area

Used cycles:
   8.98%                   160970528                       R_VisPlaneShaderQuickMip
   8.67%   8.81%   9.36%   155427272 157942540 167785476   _BM_P_CrossBSPNode
   6.72%                   120376872                       R_AdvanceSurface_TMip0
   5.87%   5.89%  31.41%   105182088 105642172 563053292   _P_RunThinkers
   5.16%                    92466824                       R_AdvanceSurface_NMip0
   3.78%                    67752036                       R_SpriteColumnShader_Masked2
   3.27%   3.29%   3.47%    58631892  58898244  62237536   stream_texture
   3.14%   3.15%   3.29%    56208744  56447780  58915972   _R_DrawColumn
   3.07%   3.08%   7.34%    55006164  55260180 131661912   _P_CheckPosition
   3.00%   3.00%   3.00%    53787584  53808236  53808236   _BM_A_Mux3x2
   2.96%                    53007324                       R_AdvanceSurface_NMip1
   2.83%                    50811876                       R_AdvanceSurface_TMip1
   2.46%   2.47%   2.76%    44138360  44331724  49414868   _V_DrawPatch
   1.99%   2.00%   2.13%    35639496  35786720  38231024   _R_PointInSubsector
   1.75%   1.76%  11.22%    31353176  31526736 201122240   _BM_P_CheckSight
   1.60%   1.61%   1.77%    28647508  28776152  31788204   _PIT_CheckLine
   1.49%                    26661392                       R_DrawTSurface_Masked1
   1.29%   1.30%   4.67%    23197192  23314416  83791336   _R_DrawVisSprite
   1.28%   1.29%   1.36%    22951200  23054480  24347164   _P_UpdateSpecials
   1.23%   1.24%  12.27%    22061980  22157364 219986268   _P_LookForPlayers
...
Instruction cache misses:
  10.30%  10.35%  59.69%     2531672   2543028  14673456   _P_RunThinkers
   7.45%   7.48%  15.42%     1832323   1838568   3790031   _P_CheckPosition
   6.95%   7.51%   7.72%     1709433   1846350   1896794   _BM_P_CrossBSPNode
   5.47%   5.49%  13.24%     1345721   1349484   3255528   _BM_P_CheckSight
   4.70%   4.71%   4.89%     1154917   1157890   1202419   _PIT_CheckLine
   3.30%   3.31%  17.24%      811840    814022   4237675   _P_LookForPlayers
   3.17%   3.18%   3.33%      779144    780871    817637   R_AddSpriteSpans
   2.95%   2.95%  40.84%      724927    726329  10038799   _P_SetMobjState
   2.85%   2.86%   4.70%      700489    703285   1156491   _R_DrawVisSprite
   2.67%   2.69%   2.74%      657187    660850    673259   _R_PointInSubsector
   1.95%   1.96%  19.60%      479552    481052   4817158   _P_TryMove
   1.69%   1.71%   1.77%      415062    421135    434763   _R_DrawColumn
   1.69%   1.71%   2.11%      414785    419617    517605   _V_DrawPatch
   1.68%   1.68%   3.90%      412976    413595    957796   _V_CopyRect
   1.37%   1.37%  19.14%      337239    337814   4704346   _A_Look
   1.34%   1.35%   2.38%      330298    330818    584228   _P_SetThingPosition
   1.33%   1.34%   1.36%      326692    328700    333499   R_ViewTestSpriteLines
   1.32%   1.32%  14.57%      324315    325183   3581684   _P_CheckSight
   1.27%   1.27%  17.72%      311809    312287   4355208   _A_Chase
   1.23%   1.23%   1.27%      301649    303345    311789   _PIT_CheckThing
   1.17%   1.17%   7.02%      286676    287519   1725232   _STlib_drawNum
   1.13%   1.14%   1.15%      278497    279436    282632   _PIT_AddLineIntercepts_L
   1.10%   1.10%   3.61%      269294    270222    886508   _P_PathTraverse
   1.00%   1.00%  15.51%      244598    245307   3813141   _P_Move
E.g. stream_texture and stack_visplane were higher on the worst frame, but in general worst frame and whole timedemo average are getting fairly close in what's the most expensive functionality.

DSP side:

Code: Select all

Used cycles:
  48.27%                  1730650824                       command_base
  18.06%  19.34%  19.34%   647582434 693331682 693331682   R_DoColumnPerspCorrect
   8.90%   8.90%   8.90%   319216842 319216842 319216842   VPRenderSpanQuickMip
   5.76%                   206496458                       ALGO_P_CrossBSPNode
   3.44%                   123435498                       R_VPLoadTexture
   2.33%                    83666106                       P_CrossSubsector_body
   1.73%                    61959168                       R_DoColumnTextureUV
   1.65%   1.65%   1.65%    59319566  59319566  59319566   extract_subvisplane
   0.90%   0.93%  15.58%    32381858  33226524 558545954   AddMidWall
   0.90%   0.91%   1.89%    32100864  32462482  67821952   AddLowerWall
   0.88%                    31536670                       R_ViewTestAddLine
   0.81%   0.81%   0.81%    28867582  28867582  28867582   VPRenderSpanWarp
   0.80%   0.80%   0.80%    28712952  28720104  28720104   InterceptVectorsUF
   0.66%   0.00%   0.00%    23546370     95358     95358 * Divs48_Real
   0.57%                    20292740                       R_VPRenderPlane
   0.56%                    20206338                       project_node
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Hm, it seems there might still be something to optimize before looking more into TIMERBASE=3.

I got this when I started also worst frame profiling from the first A_Chase() call (instead of LineAttack) in Doom I timedemo:

Code: Select all

CPU side:
Time spent in profile = 0.29958s.
...
Executed instructions:
  61.20%                      411147                       R_SpriteColumnShader_Masked2
  28.03%                      188337                       R_AdvanceSurface_NMip0
   5.58%   5.58%   5.58%       37477     37497     37497   _BM_A_Mux3x2
   3.04%                       20454                       R_StackTransparentSurface
...
Visits/calls:
  47.36%                 287             R_AdvanceSurface_NMip0
   9.90%                  60             init_font
   2.31%                  14             R_BSPHyperPlane
   1.82%   1.82%          11        11   _BM_A_Mux3x2
   1.82%                  11             _audio_mux_asm
   1.82%   1.82%          11        11   _frame_event
   1.82%   3.63%          11        22   _subframe_block
   1.82%   7.26%          11        44   _audio_mux_frame
   1.65%                  10             R_AddLine_loop
   1.49%                   9             R_AddLine_invisible
   1.49%   4.29%           9        26   render_wall
   1.32%   1.32%           8         8   cache_resource
   1.32%   1.32%           8         8   add_wall_segment
   1.32%   1.32%           8         8   update_dirty_sector
   1.16%                   7             R_TransparentSurfaceLoop
   1.16%                   7             R_TransparentSurfaceNext
   1.16%                   7             R_DrawTSurface_Masked2
   1.16%                   7             R_StackTransparentSurface
   1.16%                   7             R_SpriteColumnShader_Masked2
   0.66%                   4             R_RenderBSPNode
   0.66%                   4             ssector_node
   0.66%                   4             R_PopBSPNode
...
DSP side:
Used cycles:
  64.52%                     6201732                       command_base
  29.17%  29.88%  29.88%     2803770   2872548   2872548   R_DoColumnPerspCorrect
   2.58%                      247768                       R_DoColumnTextureUV
   1.40%   1.41%   4.15%      134764    135578    398976   AddTransWall
   0.83%   2.68%   2.68%       79974    257714    257714   R_DoColumnConstantClone
   0.70%   0.70%  30.59%       66828     66926   2940288   AddMidWall
Most time consuming instructions are in R_SpriteColumnShader_Masked2:

Code: Select all

$054e8e :             jmp       $54e92(pc,d0.w*2)          0.33% (2192, 26304, 0)
$054e92 :             bra.s     $54eca                     0.12% (838, 6760, 14)
$054e94 :             bra.s     $54ebe                     0.07% (474, 3812, 5)
$054e96 :             bra.s     $54eb2                     0.06% (376, 3016, 4)
$054e98 :             bra.s     $54ea6                     0.08% (504, 4056, 12)
$054e9a :             move.b    (a2,d4.w),d0               1.94% (13056, 156840, 0)
$054e9e :             adda.l    d6,a0                      1.94% (13056, 156, 45)
$054ea0 :             move.w    (a5,d0.w*2),(a0)           1.94% (13056, 209372, 37)
$054ea4 :             addx.l    d3,d4                      1.94% (13056, 28, 8)
$054ea6 :             move.b    (a2,d4.w),d0               2.02% (13560, 162892, 0)
$054eaa :             adda.l    d6,a0                      2.02% (13560, 196, 53)
$054eac :             move.w    (a5,d0.w*2),(a0)           2.02% (13560, 217428, 48)
$054eb0 :             addx.l    d3,d4                      2.02% (13560, 32, 10)
$054eb2 :             move.b    (a2,d4.w),d0               2.07% (13936, 167412, 0)
$054eb6 :             adda.l    d6,a0                      2.07% (13936, 212, 57)
$054eb8 :             move.w    (a5,d0.w*2),(a0)           2.07% (13936, 223052, 22)
$054ebc :             addx.l    d3,d4                      2.07% (13936, 16, 2)
$054ebe :             move.b    (a2,d4.w),d0               2.14% (14410, 173204, 0)
$054ec2 :             adda.l    d6,a0                      2.14% (14410, 64, 14)
$054ec4 :             move.w    (a5,d0.w*2),(a0)           2.14% (14410, 231000, 12)
$054ec8 :             addx.l    d3,d4                      2.14% (14410, 28, 6)
$054eca :             dbra      d1,$54e9a                  2.27% (15248, 130976, 0)
$054ece :             move.w    d4,d0                      0.33% (2192, 8768, 23)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Thanks for the results I'm still looking through them.
Eero Tamminen wrote:Hm, it seems there might still be something to optimize before looking more into TIMERBASE=3.
I don't think this one is an optimization problem TBH, at least not anymore. The main optimization which had been applied there was turning pixel-testing for transparency into solid runs and skips. Having done that, and with the code caching fully, there isn't really anything else left to do with it beyond tweaks. Further optimizations will result in features being removed (such as lighting of sprites).

The primary problem is the number of objects, and their proximity, demanded by the game at that point in time. Unlike walls, sprites don't influence the occlusion buffers. They are clipped by this system but don't contribute to it. So overdraw occurs and it's not avoidable. This isn't specific to BadMood it's a generic problem with sprites in these engines.

The second problem is the fact that the scaled sprite system (and now the walls) rely on a datacache to buy back cycles on large fills. Hatari has no datacache emulation, so it is not giving accurate figures here, and the cost of sprites rises dramatically in the profiler as sprites get bigger. We won't be able to profile that accurately until Hatari supports the 68030 datacache.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: E.g. stream_texture and stack_visplane were higher on the worst frame, but in general worst frame and whole timedemo average are getting fairly close in what's the most expensive functionality.
These two are a rough indicator of scene complexity - number of state changes relating to map sectors - steps, ponds, ledges, doorways etc. stack_visplane tends to happen when any state changes for floor or ceiling between two sectors in draw order. stream_texture counts the number of unique texture state changes in the scene for floors and ceilings.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: With Doom I, the first startup seems to be taking ~2.5 min when it generates/caches the data. With real Atari hard disk that's probably a bit longer. There are no resource loads during normal game play.
The first run will take a long time if the cache is empty. It has to do all the mipmapping/recolouring stuff, especially slow if there are HD textures available on disk which map to the WAD.
Eero Tamminen wrote: QuickMip stuff is high both on CPU & DSP side. Flushing with stuff it calls seems to causing i-cache misses.
???QuickMip is a version of the floor/ceiling shader for the most common cases (normal floors, mipmapped). The sky and liquids have separate versions. It's approximately fill-limited in most cases except maps with tons of tiny sectors.
Eero Tamminen wrote: If you have about finished the optimizations for the timedemo compatible parts of thinking phase, I could switch the worst frame profiling to TIMERBASE=3 again.
TBH it's going to be tricky to auto-profile anything in that department because most such optimizations break demo replay coherence. Even if we get demo recording working on the Falcon, such a demo will immediately be invalidated by some of the changes I would make for optimization.

I'm not sure there is an answer to this one, except perhaps profiling only the level startpoints (which are not representative at all).
Eero Tamminen wrote: Btw. I get one error on console when BM starts:

Code: Select all

InitTextureserror: could not map flat [F1_START] via BM API
Is this something known?
I have seen it, but I'm not sure yet where the fault is. It seems like Doom starts counting 'flat' texture indices in the WAD from the F1_START marker instead of the first actual flat. Whether this is an existent bug or not I'm not sure. But it is a bug now because it's being fed to BM as a texture translation mapping.
Eero Tamminen wrote: With Doom II, BM tries access following nonexisting directory: BMC/SPR/VILE. The BMC/FLT subdirectory created under cache directory, is empty even after running both Doom I & Doom II.
Not sure what's going on there - VILE?? is a sprite, not a directory. I did fix a bug recently relating to string8 conversion to filenames but it only affects HD texture loading, and there are none for the VILE sprite. I'll look later.

FLT\ cache directory will remain empty for now - I haven't got around to caching the floor textures. They are still generated at runtime (and slow down loading accordingly).
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Here's same data for Doom II.

Whole timedemo, CPU side:

Code: Select all

Time spent in profile = 76.79584s.
...
Executed instructions:
   7.82%   7.90%   8.32%    10604911  10708703  11276857   _BM_P_CrossBSPNode
   6.50%                     8813693                       R_AdvanceSurface_TMip2
   6.09%   6.10%   6.29%     8256622   8265670   8528958   _R_PointInSubsector
   5.52%                     7482941                       R_AdvanceSurface_NMip0
   5.37%   5.38%  33.27%     7278744   7290796  45099937   _P_RunThinkers
   5.26%                     7137557                       R_AdvanceSurface_TMip0
   4.52%                     6121373                       R_VisPlaneSkyShader
   4.27%   4.28%  13.05%     5784277   5798608  17687692   _P_CheckPosition
   2.99%                     4055421                       R_AdvanceSurface_TMip1
   2.70%                     3660966                       R_AdvanceSurface_TMip3
   2.56%                     3476590                       R_VisPlaneShaderWarp
   2.56%   2.56%   2.69%     3470122   3477338   3645132   _R_DrawColumn
   2.21%   2.21%   2.21%     2994077   2995857   2995857   _BM_A_Mux3x2
   2.13%   2.14%   2.31%     2892354   2899677   3133794   _PIT_CheckThing
   2.12%   2.13%   2.30%     2880416   2884693   3118471   _V_DrawPatch
   2.05%                     2778391                       R_VisPlaneShaderQuickMip
   1.86%                     2517742                       R_BSPHyperPlane
   1.79%   1.80%   1.96%     2427000   2435859   2658515   stream_texture
   1.37%   1.37%   1.37%     1856122   1857162   1857162   _BM_A_Mux2x2
   1.35%                     1827082                       R_AdvanceSurface_NMip3
   1.32%   1.33%   1.43%     1796108   1799312   1932754   _P_UpdateSpecials
   1.28%                     1737562                       R_DrawTSurface_Masked1
   1.05%   1.05%   3.81%     1425594   1428978   5167834   _R_DrawVisSprite
   1.03%                     1390094                       R_AddLine_loop
   1.02%   1.02%   1.10%     1381492   1384795   1486430   R_ViewTestSpriteLines
   0.98%   0.98%   1.07%     1332894   1335343   1450196   R_AddSpriteSpans
   0.97%   0.97%   0.97%     1310742   1311602   1311602   _BM_A_Mux1x2
   0.88%   0.89%   1.00%     1196467   1200011   1351137   _PIT_CheckLine
   0.84%                     1134760                       R_SpriteColumnShader_Masked2
   0.82%   0.84%   0.88%     1109655   1134982   1187881   stack_visplane_area
   0.77%   0.77%   9.14%     1041176   1044253  12385409   _BM_P_CheckSight
   0.70%   0.70%   8.55%      943806    945574  11598169   _P_LookForPlayers
   0.67%   0.67%   0.70%      902900    904503    943594   R_SetSubSectorLuma
   0.59%                      804014                       R_AdvanceSurface_NMip2
   0.58%                      787953                       build_ssector
   0.58%   0.58%  13.79%      782750    785441  18702401   _P_Move
   0.55%   0.55%   1.37%      747034    749028   1862292   get_ssector
   0.54%   0.54%   0.56%      732802    733997    757066   init_stategroups
   0.52%   0.53%  15.06%      708808    712134  20413976   _P_TryMove
...
Used cycles:
   8.46%   8.57%   9.01%   104172448 105610296 111000388   _BM_P_CrossBSPNode
   4.99%                    61526524                       R_AdvanceSurface_TMip2
   4.77%   4.79%  34.50%    58736536  59000952 425097356   _P_RunThinkers
   4.65%   4.67%  12.89%    57287348  57550596 158786028   _P_CheckPosition
   4.31%                    53157716                       R_AdvanceSurface_NMip0
   4.07%   4.09%   4.29%    50120808  50337108  52875856   _R_PointInSubsector
   4.02%                    49518132                       R_AdvanceSurface_TMip0
   3.27%                    40232000                       R_VisPlaneSkyShader
   3.02%                    37174892                       R_VisPlaneShaderWarp
   2.99%   3.00%   3.18%    36830660  37007772  39150112   stream_texture
   2.70%   2.71%   2.89%    33295392  33448176  35646288   _PIT_CheckThing
   2.60%   2.61%   2.74%    31992468  32135896  33744216   _R_DrawColumn
   2.47%                    30435508                       R_VisPlaneShaderQuickMip
   2.30%                    28365352                       R_AdvanceSurface_TMip1
   2.15%   2.15%   2.15%    26488644  26515700  26515700   _BM_A_Mux3x2
   2.10%                    25865264                       R_AdvanceSurface_TMip3
   2.02%                    24861936                       R_BSPHyperPlane
   1.81%   1.81%   1.91%    22239928  22316944  23568652   _P_UpdateSpecials
   1.77%   1.77%   1.98%    21774956  21867144  24337900   _V_DrawPatch
   1.38%   1.38%   1.38%    17039628  17055436  17055436   _BM_A_Mux2x2
   1.38%   1.39%   1.52%    17022900  17094636  18742504   _PIT_CheckLine
   1.22%                    14984996                       R_AddLine_loop
   1.21%   1.22%  10.27%    14919440  14978924 126586288   _BM_P_CheckSight
   1.16%   1.16%   1.24%    14244816  14311724  15263664   R_ViewTestSpriteLines
   1.14%   1.15%   3.96%    14078120  14144984  48844176   _R_DrawVisSprite
   1.14%   1.14%   1.14%    14063676  14076748  14076748   _BM_A_Mux1x2
   1.14%   1.14%   1.24%    14027448  14081064  15298956   R_AddSpriteSpans
   1.08%                    13284316                       R_AdvanceSurface_NMip3
   1.01%   1.15%   1.19%    12464728  14200136  14708300   stack_visplane_area
   0.96%   0.97%  14.18%    11853008  11906824 174710764   _P_Move
   0.89%   0.90%  15.06%    10988412  11044436 185555400   _P_TryMove
   0.88%                    10886872                       R_DrawTSurface_Masked1
   0.88%                    10822052                       build_ssector
   0.76%                     9308440                       copy16_d
   0.69%   0.69%   9.57%     8509012   8545492 117898520   _P_LookForPlayers
   0.65%                     8046744                       copy256
   0.64%                     7831256                       copy256_d
   0.62%                     7659756                       R_SpriteColumnShader_Masked2
   0.60%   0.61%   1.68%     7425596   7459856  20669880   get_ssector
   0.59%   0.59%   0.62%     7295684   7328064   7697088   R_SetSubSectorLuma
   0.56%                     6889360                       R_StackTransparentSurface
   0.54%   0.54%  27.05%     6685372   6709656 333230124   _P_SetMobjState
...
Instruction cache misses:
  12.75%  12.78%  23.73%     2450727   2456834   4561713   _P_CheckPosition
   7.22%   7.25%  57.58%     1387437   1393988  11070116   _P_RunThinkers
   5.71%   6.08%   6.23%     1097846   1168064   1196894   _BM_P_CrossBSPNode
   4.71%   4.73%   4.81%      904836    910239    924926   _R_PointInSubsector
   3.85%   3.85%   3.98%      739745    740924    764672   R_AddSpriteSpans
   3.80%   3.81%   4.02%      730967    732681    772390   _PIT_CheckLine
   3.44%   3.44%   9.68%      660550    661850   1861822   _BM_P_CheckSight
   2.82%   2.84%   2.91%      542833    546494    559311   _PIT_CheckThing
   2.53%   2.53%  28.67%      485599    487135   5511377   _P_TryMove
   2.22%   2.23%  26.84%      427518    428860   5161334   _P_Move
   2.22%   2.23%   3.67%      427153    428803    706091   _R_DrawVisSprite
   1.97%   1.97%  44.96%      378401    378930   8645156   _P_SetMobjState
   1.89%                      363658                       build_ssector
   1.74%   1.74%  10.54%      334245    335136   2026920   _P_LookForPlayers
   1.62%                      310587                       R_AddLine_loop
   1.54%   1.55%   1.57%      296006    297601    302548   R_ViewTestSpriteLines
   1.32%   1.33%   1.38%      253037    256562    265374   _R_DrawColumn
   1.21%   1.21%  22.89%      232003    232519   4401117   _P_NewChaseDir
   1.19%   1.19%   2.76%      229030    229495    530899   _V_CopyRect
   1.16%   1.17%   1.44%      222748    225022    276277   _V_DrawPatch
   1.11%   1.12%  29.57%      214284    214563   5684766   _A_Chase
   1.07%   1.07%   6.69%      205551    206452   1286290   R_FlushDeferredSurfaces
   1.05%   1.05%   8.94%      201342    201612   1718419   R_SubSectorTryFlush
   0.96%                      185309                       R_BSPHyperPlane
   0.95%                      183156                       R_AddOverlappingSprites
   0.91%   0.92%   1.80%      175809    176221    345651   _P_SetThingPosition
   0.83%   0.83%   0.84%      159586    160244    162389   add_wall_segment
   0.80%   0.80%   4.89%      153352    153803    939280   _STlib_drawNum
   0.73%   0.74%  12.96%      141276    141577   2491036   _A_Look
   0.72%   0.73%  10.42%      139266    139741   2003112   _P_CheckSight
DSP side:

Code: Select all

Used cycles:
  48.29%                  1189949724                       command_base
  18.24%  19.48%  19.48%   449379294 479969092 479969092   R_DoColumnPerspCorrect
   4.87%                   119998916                       ALGO_P_CrossBSPNode
   3.39%                    83597630                       R_VPRenderSky
   3.15%                    77630598                       R_VPLoadTexture
   2.94%   2.94%   2.94%    72481156  72481156  72481156   VPRenderSpanWarp
   2.62%                    64641630                       P_CrossSubsector_body
   2.21%   2.21%   2.21%    54484914  54484914  54484914   VPRenderSpanQuickMip
   1.88%   1.88%   1.88%    46356406  46356406  46356406   extract_subvisplane
   1.69%                    41575386                       R_ViewTestAddLine
   1.44%                    35372808                       R_DoColumnTextureUV
   0.99%   1.02%   4.96%    24319178  25056370 122239306   AddLowerWall
   0.99%                    24276688                       project_node
   0.89%   0.89%   0.89%    21873920  21884536  21884536   InterceptVectorsUF
   0.73%   0.77%  15.81%    18077728  19017240 389545590   AddMidWall
   0.66%   0.00%   0.00%    16230648     51224     51224 * Divs48_Real
   0.65%                    16102746                       R_CheckBBoxPair
   0.57%                    14036102                       R_VPRenderPlane
   0.56%   0.58%   1.70%    13900012  14369390  41824068   AddUpperWall
Interestingly about same amount of DSP is free / looping in command base as with Doom I. Here CrossBSPNode is higher than QuickMip, which I guess is expected due to more complex levels.

After first A_Chase() call there are few disk reads, not through load/read_resource() though:

Code: Select all

- 0x3dcba: _R_RenderPlayerView (return = 0x22e70)
- 0x40a02: _R_DrawMasked (return = 0x3de2c)
- 0x403d4: _R_DrawPSprite (return = 0x40a92)
- 0x3ff54: _R_DrawVisSprite (return = 0x4054e)
- 0x41152: _W_CacheLumpNum (return = 0x3ff72)
- 0x608dc: ___read (return = 0x4122e)
GEMDOS 0x3F Fread(64, 3400, 0x3f2c8c)
...
GEMDOS 0x3F Fread(64, 1156, 0x3f3f90)
...
GEMDOS 0x3F Fread(64, 8860, 0x3f442c)
...
GEMDOS 0x3F Fread(64, 10368, 0x3f66e0)
...
GEMDOS 0x3F Fread(64, 10668, 0x3f9680)
...
GEMDOS 0x3F Fread(64, 1668, 0x3fc784)
...
GEMDOS 0x3F Fread(64, 8180, 0x3fd1a4)
...
GEMDOS 0x3F Fread(64, 2684, 0x3ff3cc)
...
GEMDOS 0x3F Fread(64, 8204, 0x3fff14)
...
GEMDOS 0x3F Fread(64, 2308, 0x401f38)
Any comments on how to catch the names of these sprites?


The VILE warnings stuff is from BM trying to do:

Code: Select all

GEMDOS 0x3D Fopen("BMC\SPR\VILE\1.BMT", read-only)
No GEMDOS dir '/home/linuxdoom/autoprofile/BMC/SPR/VILE'
GEMDOS 0x42 Fseek(9525352, 65, 0)
GEMDOS 0x3F Fread(65, 3120, 0x96fc70)
GEMDOS 0x3C Fcreate("BMC\SPR\VILE\1.BMT", 0x0)
No GEMDOS dir '/home/linuxdoom/autoprofile/BMC/SPR/VILE'
But it's not creating the VILE subdirectory first.

[EDIT] fixed typos.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Interestingly about same amount of DSP is free / looping in command base as with Doom I. Here CrossBSPNode is higher than QuickMip, which I guess is expected due to more complex levels.
More complex levels (and/or more objects on the map using their eyes) will cause this. It's the enemy count which determines the number of rays cast, and the map which determines how the ray gets chopped up. So it can get quite bad if both are increased together.
Eero Tamminen wrote: After first A_Chase() call there are few disk reads, not through load/read_resource() though:

Code: Select all

- 0x3dcba: _R_RenderPlayerView (return = 0x22e70)
- 0x40a02: _R_DrawMasked (return = 0x3de2c)
- 0x403d4: _R_DrawPSprite (return = 0x40a92)
- 0x3ff54: _R_DrawVisSprite (return = 0x4054e)
- 0x41152: _W_CacheLumpNum (return = 0x3ff72)
- 0x608dc: ___read (return = 0x4122e)
GEMDOS 0x3F Fread(64, 3400, 0x3f2c8c)
Any comments on how to catch the names of these sprites?
A bit more tricky than before, if the files come from the local cache. They pass through here in that case... it takes a few instructions until a4 is loaded with the address of the texturedef (name). You could set a breakpoint there or we could introduce a special label to make it easier to catch.

Code: Select all

*-------------------------------------------------------*	
D_TextureCacheIn:
*-------------------------------------------------------*	
	movem.l		d2-d7/a0-a6,-(sp)
*-------------------------------------------------------*
	move.w		cache_entry(a0),d0
	move.l		resourcedef_table,a4
	move.l		(a4,d0.w*4),a4
Eero Tamminen wrote: The VILE warnings stuff is from BM trying to do:

Code: Select all

GEMDOS 0x3D Fopen("BMC\SPR\VILE\1.BMT", read-only)
No GEMDOS dir '/home/linuxdoom/autoprofile/BMC/SPR/VILE'
GEMDOS 0x42 Fseek(9525352, 65, 0)
GEMDOS 0x3F Fread(65, 3120, 0x96fc70)
GEMDOS 0x3C Fcreate("BMC\SPR\VILE\1.BMT", 0x0)
No GEMDOS dir '/home/linuxdoom/autoprofile/BMC/SPR/VILE'
But it's not creating the VILE subdirectory first.
That just looks wrong to me. VILE1 (etc) is a sprite lump, not a directory. Likely a bug in filename/path handling, possibly the non-zero-terminated strings again...
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Ok, the disk loads might be for VILE stuff as its cache creation failed. I'll check the loads again after you've fixed the VILE caching.

Here's worst frame for Doom II timedemo.

Rendering, CPU side:

Code: Select all

Time spent in profile = 0.24069s.
...
Executed instructions:
  22.46%                      105599                       R_SpriteColumnShader_Masked2
  14.07%                       66163                       R_AdvanceSurface_NMip0
   9.58%                       45062                       R_VisPlaneSkyShader
   6.87%                       32305                       R_AdvanceSurface_TMip3
   6.53%   6.54%   6.54%       30712     30732     30732   _BM_A_Mux3x2
   3.96%                       18630                       R_AdvanceSurface_TMip2
   3.81%                       17928                       R_VisPlaneShaderWarp
   3.50%                       16478                       R_DrawTSurface_Masked1
   2.75%   2.76%   2.76%       12944     12978     12978   stream_texture
   2.71%                       12728                       R_BSPHyperPlane
   2.55%                       11984                       R_AdvanceSurface_NMip3
   2.00%                        9403                       R_StackTransparentSurface
   1.85%   1.86%   1.95%        8719      8726      9167   R_AddSpriteSpans
   1.49%                        6992                       R_AddLine_loop
   1.47%                        6920                       R_VisPlaneShaderQuickMip
   1.35%   1.38%   1.38%        6365      6481      6481   stack_visplane_area
   1.33%   1.33%   1.33%        6257      6261      6261   R_ViewTestSpriteLines
   0.93%                        4372                       R_AdvanceSurface_TMip4
   0.84%                        3946                       R_DrawSurface_NMip
   0.84%   0.84%   0.84%        3931      3931      3931   R_SetSubSectorLuma
   0.82%                        3857                       build_ssector
   0.76%   0.76%   0.76%        3569      3569      3569   add_wall_segment
   0.65%   0.65%   0.65%        3079      3079      3079   init_stategroups
   0.65%                        3053                       R_AdvanceSurface_NMip2
   0.50%   0.50%   0.50%        2351      2351      2351   _BM_A_Mux2x2
...
Instruction cache misses:
  17.40%  17.41%  18.16%        4898      4901      5113   R_AddSpriteSpans
   6.28%                        1767                       build_ssector
   5.79%   5.80%   5.80%        1629      1632      1632   R_ViewTestSpriteLines
   5.37%                        1511                       R_AddLine_loop
   3.89%   3.95%  23.06%        1096      1113      6493   R_FlushDeferredSurfaces
   3.56%   3.56%  31.76%        1001      1001      8942   R_SubSectorTryFlush
...
Visits/calls:
   5.70%                 178             R_AddLine_loop
   5.67%                 177             R_AddLine_invisible
   4.77%                 149             R_AdvanceSurface_TMip3
   3.59%                 112             R_AdvanceSurface_NMip3
   3.52%                 110             R_BSPHyperPlane
   3.40%                 106             R_AdvanceSurface_NMip0
   3.17%   4.84%          99       151   render_wall
   2.72%   2.72%          85        85   cache_resource
   2.72%                  85             R_AdvanceSurface_TMip2
At least in Hatari, worst Doom II timedemo frame is better than worst Doom I timedemo frame... Based on your comment about datacache, that might not hold true on real Falcon though. :-)

DSP side:

Code: Select all

Used cycles:
  35.83%                     2766732                       command_base
  22.17%  23.97%  23.97%     1711970   1851238   1851238   R_DoColumnPerspCorrect
   7.59%                      585804                       R_VPRenderSky
   5.45%   5.45%   5.45%      421038    421038    421038   VPRenderSpanWarp
   5.06%                      390570                       R_VPLoadTexture
   4.02%                      310650                       R_ViewTestAddLine
   2.77%   2.77%   2.77%      213642    213642    213642   extract_subvisplane
   2.68%                      207096                       R_DoColumnTextureUV
   2.63%   2.63%   2.63%      203100    203100    203100   VPRenderSpanQuickMip
   1.71%   1.73%   6.59%      132116    133522    508670   AddLowerWall
   1.63%                      126206                       project_node
   1.07%   1.09%   3.86%       82708     83830    297706   AddUpperWall
   1.02%                       78882                       R_CheckBBoxPair
   0.90%   0.93%  18.23%       69774     71846   1408152   AddMidWall
   0.76%   0.81%   2.36%       58788     62528    181910   AddTransWall
   0.63%   0.63%   0.63%       48632     48632     48632   R_BufferSurface
   0.51%                       39696                       R_VPRenderPlane
I guess 1/3 of DSP is free due to large amount of time going to sprite rendering.


Thinking, CPU side:

Code: Select all

Time spent in profile = 0.25277s.
...
Executed instructions:
  17.87%  18.05%  21.41%       73464     74219     88016   _BM_P_CrossBSPNode
  13.47%  13.49%  14.37%       55373     55451     59090   _R_PointInSubsector
   8.52%   8.53%  69.72%       35024     35083    286663   _P_RunThinkers
   8.51%   8.52%  26.65%       34997     35042    109582   _P_CheckPosition
   6.64%   6.65%   6.65%       27306     27326     27326   _BM_A_Mux3x2
   5.30%   5.30%  13.80%       21774     21801     56725   _P_PathTraverse
   4.11%   4.12%   4.12%       16906     16937     16937   _PIT_CheckThing
   3.78%   3.80%   4.70%       15555     15615     19308   _PIT_AddLineIntercepts_L
   3.61%   3.62%   4.64%       14841     14879     19067   _V_DrawPatch
   2.05%   2.06%   2.06%        8440      8475      8475   _P_UpdateSpecials
   1.97%   1.97%   2.09%        8093      8107      8587   _PIT_CheckLine
   1.57%   1.57%  22.98%        6464      6471     94487   _BM_P_CheckSight
   1.57%   1.57%   1.57%        6452      6452      6452   _R_DrawColumn
   1.27%   1.27%  19.10%        5226      5233     78544   _P_LookForPlayers
   1.19%   1.20%   2.78%        4908      4935     11419   _R_DrawVisSprite
   1.14%   1.14%   1.14%        4702      4702      4702   _BM_A_Mux2x2
   1.13%   1.14%  30.78%        4627      4674    126537   _P_TryMove
   1.12%   1.12%   1.12%        4609      4616      4616   _P_PointOnDivlineSide
   1.11%   1.12%  26.10%        4574      4601    107320   _P_Move
   1.10%   1.10%   2.22%        4514      4521      9137   _PIT_AddThingIntercepts
   0.82%   0.82%  53.70%        3358      3358    220784   _P_SetMobjState
   0.68%   0.69%   1.52%        2810      2830      6233   _PTR_ShootTraverse
   0.55%                        2280                       copy16_d
   0.53%   0.54%  23.13%        2198      2205     95088   _P_NewChaseDir
DSP side:

Code: Select all

Used cycles:
  76.56%                     6209388                       command_base
  10.86%                      880410                       ALGO_P_CrossBSPNode
   7.06%                      572658                       P_CrossSubsector_body
   1.89%   1.89%   1.89%      153374    153478    153478   InterceptVectorsUF
   1.28%                      103930                       Divs48_Real
   1.26%                      102284                       ALGO_P_LineIntercept
   0.74%   0.74%   0.74%       59662     59662     59662   TestLineSegVectorBisection
There's only 0.012s difference in how long slowest thinking and rendering parts of Doom II take in Hatari.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Hehe - just tried launching the latest BMVIEW on F030 + VGA and shrunk the window to 80x60.

Framerate reading when facing a wall is 62 FPS... maintains about 20-25 FPS for most of the map, drops to 12 FPS if you try to look from the nukeage/slime area across the courtyard.

Anyone want to try making an Atari game *without* the Doom game code? ;)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Framerate reading when facing a wall is 62 FPS... maintains about 20-25 FPS for most of the map, drops to 12 FPS if you try to look from the nukeage/slime area across the courtyard.

Anyone want to try making an Atari game *without* the Doom game code? ;)
By designing "doom" level suitable for Falcon/BadMood and adding their own game logic on top of that? Maybe it's time to publish the code to more public HG server [1], and find out what happens? :-)

[1] does e.g. atariforge support Mercurial?
User avatar
shoggoth
Nature
Nature
Posts: 1447
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden

Re: Bad Mood : Falcon030 'Doom'

Post by shoggoth »

Imagine a multiplayer game without bots, then :-D No AI to think about there, "just" some clever networking code.
Ain't no space like PeP-space.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

shoggoth wrote:Imagine a multiplayer game without bots, then :-D No AI to think about there, "just" some clever networking code.
That could work.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

I was checking "doomu.wad" timedemo behavior in BM against Linux PrBoom and there are few differences:
* Pink monsters aren't transparent/invisible in BM like they're in PrBoom.
* After player goes down the stairs, BM plays the demo wrong.

BM cannot play "doom1.wad" (shareware) WAD timedemo, says it's for different version, although PrBoom plays it fine.

(PrBoom plays "doom2.wad" timedemo completely wrong though :-)).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:I was checking "doomu.wad" timedemo behavior in BM against Linux PrBoom and there are few differences:
* Pink monsters aren't transparent/invisible in BM like they're in PrBoom.
I haven't quite tied up replacement shaders for sprites but it will happen. I wanted to get the floors done first (liquids) and that's finished now.
Eero Tamminen wrote: * After player goes down the stairs, BM plays the demo wrong.
Yes. This is a consequence of changing the original code, and involving DSP in game vector calcs. Any numerical differences at all will accumulate and cause replay to drift, because the demo only records player input actions and everything else must 'simulate' as it did on the original machine.

Some of the more severe drift cases are conditioned by the TIMEBASE_CONTROL flag but there are essential optimizations (like P_CheckSight) which remain on, and don't match the original 100%.

You can get old behaviour back by enabling the ORIGINAL_VERSION flag - if it still works - but even then the Linux1.10 distro isn't properly compatible with demos recorded from other engines (even Dos Doom) and the demos still have small desync problems (especially on maps with those flying skulls). As it is I had to hack the demo format version number to get the standard demos to load at all.

This isn't really fixable but playing PC demos won't be part of the final release anyway.
Eero Tamminen wrote: BM cannot play "doom1.wad" (shareware) WAD timedemo, says it's for different version, although PrBoom plays it fine.

(PrBoom plays "doom2.wad" timedemo completely wrong though :-)).
The doom game code has a version check for the demo lump. I suspect PrBoom has bypassed it, whereas I just changed the version number. Both will suffer from the same problems though - only demos recorded by *that* engine will be sure to work. Changes to game code cause desync.

It's also worth noting that some engine ports - ZDoom, Boom etc. have some significant modifications from the original game which add new features and change the WAD spec.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Based on PrBoom output, the shareware WAD has different timedemo, so it would be nice to have it working.

I also tried latest FreeDoom:
http://www.nongnu.org/freedoom/download.html

While the "Doom II" replacement WAD gives this from timedemo and trying to play myself:

Code: Select all

Demo is from a different game version!
Error: Z_Malloc: failed on allocation of 127436 bytes
The "Ultimate Doom" replacement nearly works:
http://savannah.nongnu.org/download/fre ... latest.zip

It loaded with couple of warnings:

Code: Select all

WARNING: have to clip 4 chars from 'FLOOR4_8=�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_5]�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_1=�.bmp' base!
WARNING: have to clip 4 chars from 'FLOOR5_1M�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_4M�.bmp' base!
And the timedemo started running. However, after a while the part around the screen got messed up and eventually demo playback stopped to this:

Code: Select all

Error: NetUpdate: netbuffer->numtics > BACKUPTICS
It would be nice to have support for FreeDoom, at least the Ultimate version replacement so that everybody can get full WADs.


Btw. Should (Doom II) WAD overlays to work?

I tried couple of old Doom II PWADs with doom2.wad and BM got stuck here:

Code: Select all

GEMDOS 0x42 Fseek(952923136, 67, 0)
Finalizing costs for 11 non-returned functions:
- 0x4d5ba: read_resource_header (return = 0x516da)
- 0x51622: D_CacheRegisterSpritesMarkedSet (return = 0x4ed08)
- 0x4ece0: W_InitCacheSpriteDefs (return = 0x4eb60)
- 0x4eafc: W_InitCacheDefs (return = 0x4addc)
- 0x4acd8: _BM_E_OpenWADs (return = 0x24dd4)
- 0x247c6: _D_DoomMain (return = 0x4a53a)
- 0x4a47c: _BM_S_AppEntryPoint (return = 0x4a6f6)
- 0x4a6b4: _BM_S_EntryPoint (return = 0x4a446)
- 0x4a24e: _main (return = 0x61d26)
In both cases there was invalid Fseek() offset. This was from the second (earth.wad):

Code: Select all

GEMDOS 0x42 Fseek(605905920, 67, 0)
Then I tried smallest Doom II WAD I found, "mtfactor.wad" and while that loaded fine, it gave this when tried to start a new level:

Code: Select all

Zone: used memory: 0x91210
Zone: free memory: 0x6edf0
Error: Z_Malloc: failed on allocation of 138456 bytes
Are there any nice PWADs for Doom I (In case they would need less RAM)?
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:Based on PrBoom output, the shareware WAD has different timedemo, so it would be nice to have it working.
It does work - it's just that you have an old version of shareware WAD :-) Get the v1.9 revision (I think I sent you a link for this in past emails).

I don't really want to try to support all the older versions of the WADs (1.666, 1.7, 1.8 etc) because it's extra trouble and there are 1.9 versions of all of them now, including a free patch for the shareware one. It might just be a case of disabling the version check, but then again it might not.
Eero Tamminen wrote: I also tried latest FreeDoom:
http://www.nongnu.org/freedoom/download.html

While the "Doom II" replacement WAD gives this from timedemo and trying to play myself:

Code: Select all

Demo is from a different game version!
Error: Z_Malloc: failed on allocation of 127436 bytes
I assume that's an open-source version of the Doom II IWAD or something? I don't know much about that but it looks like it's falling foul of the version check as well.

There's probably an argument for supporting open-source replacements but I really don't know what's hiding in there. The fact it doesn't pass a v1.9 version check is not encouraging :) (Maybe it's just v1.10, which is what our sourcebase actually is).
Eero Tamminen wrote: The "Ultimate Doom" replacement nearly works:
http://savannah.nongnu.org/download/fre ... latest.zip

It loaded with couple of warnings:

Code: Select all

WARNING: have to clip 4 chars from 'FLOOR4_8=�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_5]�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_1=�.bmp' base!
WARNING: have to clip 4 chars from 'FLOOR5_1M�.bmp' base!
WARNING: have to clip 4 chars from 'TLITE6_4M�.bmp' base!
Those warnings are caused by a bug in the cache file handling which i mentioned recently, and is fixed. Should be in the repo now.
Eero Tamminen wrote: And the timedemo started running. However, after a while the part around the screen got messed up and eventually demo playback stopped to this:

Code: Select all

Error: NetUpdate: netbuffer->numtics > BACKUPTICS
The corruption is another bug which I mentioned, and fixed today but I think it hasn't been checked in yet. I was working on yet *another* bug which was a Hatari/real-hardware divergence thing. :-z
Eero Tamminen wrote: It would be nice to have support for FreeDoom, at least the Ultimate version replacement so that everybody can get full WADs.
Yes probably should be supported. Hopefully it's just a version check thing and not a format change.
Eero Tamminen wrote: Btw. Should (Doom II) WAD overlays to work?
All PWADs should work if their associated IWADs also work. I have found plenty which don't work but they tend to fall into categories:

- weird, inconsistent bugs in maps which Doom tolerates but BM doesn't know what to do with. fixed some of these recently but some are confounding.
- PWAD is ZDoom (or other) custom format, won't load - warnings + crashes
- map is way too complex, causes DSP buffer overflow (seems to be rare now - haven't seen it recenty - famous last words)
- nonstandard texture sizes, or other weird problems with resources that BM doesn't like
- other weird corner cases

Eero Tamminen wrote:I tried couple of old Doom II PWADs with doom2.wad and BM got stuck here:
I think there may be flakiness problems atm with BadMood. Perhaps try again after I get some fixes checked in.
Eero Tamminen wrote: In both cases there was invalid Fseek() offset. This was from the second (earth.wad):
GEMDOS 0x42 Fseek(605905920, 67, 0)
Ok I'll keep that in mind. I have seen something like this but when I inspected the WADs they were not standard ones, they had nonstandard stuff inside.
Eero Tamminen wrote: Then I tried smallest Doom II WAD I found, "mtfactor.wad" and while that loaded fine, it gave this when tried to start a new level:

Code: Select all

Zone: used memory: 0x91210
Zone: free memory: 0x6edf0
Error: Z_Malloc: failed on allocation of 138456 bytes
Z_Malloc failing probably means 1mb isn't enough for Doom to allocate level data for things etc. I could raise it a bit further if this is a problem. But it can also happen if the WAD contains stuff that the game or BM doesn't understand - like floor textures which aren't 64x64, or empty directory entries for things that should be textures etc..
Eero Tamminen wrote: Are there any nice PWADs for Doom I (In case they would need less RAM)?
There should be plenty but I really haven't had time to go and look at them this year. There's a site with '100 best WADs of all time' or suchlike, although a few of those I found were ZDoom format... they are by date so just rewind beyond the ZDoom incept date and it should be fine ;)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Before the next checkin, I'll raise the ram limit for the game code to 1.5mb (from 1mb) and try to remove the WAD version check. See if this makes any difference.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:It does work - it's just that you have an old version of shareware WAD :-) Get the v1.9 revision
I thought I had, it was correct size and all... But I loaded another version and with that the timedemo works fine. Good, I'll provide new profiles for that next week.

dml wrote:Before the next checkin, I'll raise the ram limit for the game code to 1.5mb (from 1mb) and try to remove the WAD version check. See if this makes any difference.
Great, thanks! Hopefully I'll have time tomorrow to check these.

According to FreeDoom readme, its levels may have some Boom stuff:

Code: Select all

Levels should be in Boom format; you may exceed the limits of Vanilla
Doom and use Boom features; however, do not use features that are not
supported by Boom 2.02 and compatible ports. Levels should be in Doom's
original format, not in ``Hexen'' format.

It is sensible to also heed the following guidelines:
...
  * Do not use tricks that exploit Doom's software renderer; some source
    ports, especially those that use hardware accelerated rendering, may
    not render it properly.  Examples of tricks to avoid include those used
    to simulate 3D bridges and ``deep water'' effects.
  * Boom removes almost all of the limits on rendering; however, do not
    make excessively complicated scenes.  It is desirable that Freedoom
    levels should be playable on old or low-powered hardware.
  * Always test in http://www.teamtnt.com/boompubl/boom2.htm[Boom]
    itself rather than a derivative such as PrBoom.  This ensures that
    your levels really are Boom-compatible rather than using any extra
    features.
...
=== Graphics

  * Graphics should be the same color and size as the originals to
    remain compatible with PWADs (otherwise, they may end up looking
    like a mess).  They cannot use the Doom font.
  * Textures should be the same dimensions as the originals.  They
    should be similar but not identical (to avoid IP infringement) --
...
  * Sprites should be roughly the same size and shape, but different to
    the originals.
...
Which may also explain the issues. Let's see after your fixes are in. :-)

I may look a bit into timedemo version checks too. I already peeked at the PrBoom code, but I need to check a bit what all those variables mean.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: I thought I had, it was correct size and all... But I loaded another version and with that the timedemo works fine. Good, I'll provide new profiles for that next week.
It's always possible I broke something but I made the same mistake a few times this week - having the wrong version of the DOOM2 wad sitting on the Falcon CFLASH and wondering why the demo wouldn't start during tests...
Eero Tamminen wrote:
dml wrote:Before the next checkin, I'll raise the ram limit for the game code to 1.5mb (from 1mb) and try to remove the WAD version check. See if this makes any difference.
Great, thanks! Hopefully I'll have time tomorrow to check these.
Done. Version checks removed, ram limit raised. My only checkouts now are gametick optimization related.
Eero Tamminen wrote: According to FreeDoom readme, its levels may have some Boom stuff:
Hmm. Limits in some cases aren't a problem because BM doesn't have them (or they are higher, where limits apply). However one of the 'mods' involved changing the coordinate storage system and that definitely won't work. I have also seen WADs with 'meta directory' structures, which confuse the loader. Won't be easy to guess what sort of things will cause problems without going through it all and checking it.


* Do not use tricks that exploit Doom's software renderer; some source
ports, especially those that use hardware accelerated rendering, may
not render it properly. Examples of tricks to avoid include those used
to simulate 3D bridges and ``deep water'' effects.

These probably wouldn't work in BM anyway. They are exploitations of unintended behaviour.

* Boom removes almost all of the limits on rendering; however, do not
make excessively complicated scenes. It is desirable that Freedoom
levels should be playable on old or low-powered hardware.

Fine :)


* Graphics should be the same color and size as the originals to
remain compatible with PWADs (otherwise, they may end up looking
like a mess). They cannot use the Doom font.
* Textures should be the same dimensions as the originals. They
should be similar but not identical (to avoid IP infringement) --
...
* Sprites should be roughly the same size and shape, but different to
the originals.

Also fine.
Eero Tamminen wrote: I may look a bit into timedemo version checks too. I already peeked at the PrBoom code, but I need to check a bit what all those variables mean.
Ok, that could be useful. I tried briefly to get my head around it but it just looks like a mess so i left it alone.

I suspect most of the original (Id) version changes were due to desync problems, caused by bugfixes to game code, breaking existing demos and had to be re-recorded. This is why all WAD version have different demos ;) the other resources didn't change much.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Given that the gamestate tick code costs so much (relative to rendering) and the fact the game tries to adapt by ticking gamestate in larger bursts to make up for lost frames, it's probably a worthwhile experiment to break this behaviour and allow the game to slow down when things get rough. This should operate like a sort of pressure valve and allow rendering to catch up, instead of getting stuck in pathological catchup activity.

Currently, when things get slow, the adaptive behaviour will act to make it even slower. This is only true because the cost of game code relative to rendering is so high.

If the cost of ticking had been relatively low (which it seems to be in the commercially released/rewritten/optimized/whatever PC and Jag versions) the adaptive behaviour would be ideal and exactly what you'd want. The game could catch up with realtime with impunity. In our case that isn't happening. OTOH, the adaptive stuff is mainly there for synchronized network games anyway (all clients tick synchronously so they must all try to chase realtime with reasonable accuracy).

So I'll probably try an experiment - fix the TICRATE at some sensible multiple of the average framerate (say, 6fps x2 = 12Hz, or x3 = 18Hz) and change the game loop to enforce exactly 2 (or 3) ticks per render regardless of what's happening. When the framerate drops, action will slow down, but no longer 'exponential slowdown'. This would also require an artificial cap on FPS to stop the game running too fast in corner cases, but that's already true anyway with the adaptive system.

Another approach would be to cap the number of ticks at some low value and allow it to be slightly adaptive but within limits - and work purely with time deltas instead of absolute time but the code is complicated so that might be harder to get working as intended. It would also be less effective as a pressure valve.

It's worth some experiments to see what works best. Probably will have a go in the evening.
User avatar
dma
Atari God
Atari God
Posts: 1223
Joined: Wed Nov 20, 2002 11:22 pm
Location: France

Re: Bad Mood : Falcon030 'Doom'

Post by dma »

By the way, while being in the toilet the other day (which is of no importance here) i was thinking and wondering if your DOOM engine could run on standard 4mb Falcon, with specifically designed WADs (which would then have some size limits on their various contents)? In the prospective of using your engine for a Falcon specific game.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

dma wrote:By the way, while being in the toilet the other day (which is of no importance here) i was thinking and wondering if your DOOM engine could run on standard 4mb Falcon, with specifically designed WADs (which would then have some size limits on their various contents)? In the prospective of using your engine for a Falcon specific game.
The engine/viewer will run in 4MB (or can do, with a bit of cleanup - it definitely did fit in the past). With the *current* Doom game attached to it however it won't fit.

The compiled Doom executable is 500k, and Doom gamestate wants approx 1MB -> 1.5MB for it's own stuff. The framebuffers require another 256k, and TOS/GEM itself uses something. I forget how much (this can be overcome via AUTO folder but that's not ideal for HD booting a game!). So that's nearly 3MB gone before BMEngine gets to load any data, and it's not counting BM's own static storage either which is currently still on the large side, about 1MB, maybe more.

Starting from scratch for the game would improve chances. Some additional stuff can be done with texture formats, per-texture lighting tables, acceleration info for masked sprites etc. to keep memory use down, if required. I haven't done much in that direction given that fitting Doom in 4MB looks so far from the mark as it is.

So if somebody started on such a project, I'd squeeze BMEngine for the smaller footprint.

Return to “680x0”