Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Shareware Doom I timedemo profile looks a bit different than Ultimate Doom one.

CPU side for the whole timedemo:

Code: Select all


Time spent in profile = 148.31308s.
Visits/calls:
- max = 162697, in R_AdvanceSurface_NMip0 at 0x53f04, on line 14195
- 1754321 in total
Executed instructions:
- max = 3345192, in R_AdvanceSurface_NMip0+142 at 0x53f92, on line 14251
- 268380120 in total
...
Executed instructions:
  23.02%                    61786615                       R_AdvanceSurface_NMip0
   8.07%                    21667860                       R_VisPlaneShaderQuickMip
   5.61%   5.62%  20.77%    15054648  15082592  55734874   _P_RunThinkers
   5.35%                    14349094                       R_AdvanceSurface_TMip0
   5.26%   5.32%   5.59%    14105733  14278839  15003289   _BM_P_CrossBSPNode
   4.24%                    11385011                       R_AdvanceSurface_NMip1
   4.13%                    11071492                       R_SpriteColumnShader_Masked2
   3.50%   3.50%   3.65%     9386054   9402807   9807779   _R_DrawColumn
   2.95%                     7913727                       R_AdvanceSurface_TMip1
   2.66%   2.66%   2.84%     7128908   7150053   7621201   stream_texture
   2.04%                     5464306                       R_DrawTSurface_Masked1
   1.99%   1.99%   4.61%     5327425   5338878  12366009   _P_CheckPosition
   1.98%   1.98%   2.06%     5313686   5318877   5538690   _R_PointInSubsector
   1.89%   1.90%   5.64%     5081655   5096160  15135288   _R_DrawVisSprite
   1.46%                     3909170                       R_VisPlaneShaderWarp
   1.41%                     3772994                       R_AdvanceSurface_TMip2
   1.38%                     3691096                       R_AdvanceSurface_NMip2
   1.37%   1.37%   1.37%     3684954   3687554   3687554   _BM_A_Mux1x2
   1.35%   1.36%   1.36%     3635526   3637466   3637466   _BM_A_Mux3x2
   1.33%                     3574756                       R_BSPHyperPlane
   0.94%   0.94%   0.97%     2512464   2516054   2597007   init_stategroups
   0.90%   0.90%   0.90%     2404748   2406088   2406088   _BM_A_Mux2x2
   0.89%   0.89%   0.94%     2381504   2387405   2526039   R_ViewTestSpriteLines
   0.86%   0.87%   0.92%     2307637   2339659   2481371   stack_visplane_area
   0.80%   0.80%   6.64%     2147361   2151333  17821195   _P_LookForPlayers
   0.79%                     2116031                       R_AddLine_loop
   0.76%                     2035553                       R_StackTransparentSurface
   0.74%   0.74%   6.37%     1990738   1998893  17105514   _BM_P_CheckSight
...
Instruction cache misses:
  11.09%  11.15%  54.57%     2786538   2800794  13709102   _P_RunThinkers
   6.48%   6.51%  13.36%     1628915   1634473   3354975   _P_CheckPosition
   6.44%   6.94%   7.11%     1616801   1744181   1786452   _BM_P_CrossBSPNode
   5.04%   5.07%   8.54%     1267104   1273006   2144728   _R_DrawVisSprite
   4.85%   4.87%  12.00%     1219499   1222627   3015465   _BM_P_CheckSight
   3.89%   3.90%   4.08%      976281    978949   1023763   _PIT_CheckLine
   3.54%   3.54%   3.76%      888256    889951    944701   R_AddSpriteSpans
   3.21%   3.25%   3.34%      806535    815885    839164   _R_DrawColumn
   2.95%   2.96%  14.69%      740586    742622   3690299   _P_LookForPlayers
   2.73%   2.74%  32.28%      686794    688275   8108937   _P_SetMobjState
   2.40%   2.42%   2.46%      603949    607207    618915   _R_PointInSubsector
   1.85%   1.86%   1.89%      464713    467583    475357   R_ViewTestSpriteLines
   1.49%   1.50%  14.63%      375226    376585   3674987   _P_TryMove
NMip0 wasn't even close to this high on Ultimate Doom timedemo profile, its heaviest part in this profile looks like this:

Code: Select all

$053f56 :             jmp       $53f5a(pc,d0.w*2)          0.06% (164773, 1978860, 0)
$053f5a :             bra.s     $53f92                     0.02% (63585, 512704, 796)
$053f5c :             bra.s     $53f86                     0.01% (34257, 276624, 546)
$053f5e :             bra.s     $53f7a                     0.01% (31832, 257096, 1155)
$053f60 :             bra.s     $53f6e                     0.01% (35099, 283608, 1260)
$053f62 :             adda.l    d6,a0                      1.19% (3180419, 12769116, 6367)
$053f64 :             move.b    (a2,d4.w),d0               1.19% (3180419, 38211184, 2634)
$053f68 :             addx.l    d3,d4                      1.19% (3180419, 12572, 138)
$053f6a :             move.w    (a5,d0.w*2),(a0)           1.19% (3180419, 50936912, 0)
$053f6e :             adda.l    d6,a0                      1.20% (3215518, 153636, 4881)
$053f70 :             move.b    (a2,d4.w),d0               1.20% (3215518, 38652764, 2664)
$053f74 :             addx.l    d3,d4                      1.20% (3215518, 6656, 264)
$053f76 :             move.w    (a5,d0.w*2),(a0)           1.20% (3215518, 51499324, 0)
$053f7a :             adda.l    d6,a0                      1.21% (3247350, 158000, 9623)
$053f7c :             move.b    (a2,d4.w),d0               1.21% (3247350, 39046644, 6995)
$053f80 :             addx.l    d3,d4                      1.21% (3247350, 8588, 1498)
$053f82 :             move.w    (a5,d0.w*2),(a0)           1.21% (3247350, 52008868, 0)
$053f86 :             adda.l    d6,a0                      1.22% (3281607, 168032, 10740)
$053f88 :             move.b    (a2,d4.w),d0               1.22% (3281607, 39454004, 5987)
$053f8c :             addx.l    d3,d4                      1.22% (3281607, 7820, 1582)
$053f8e :             move.w    (a5,d0.w*2),(a0)           1.22% (3281607, 52556112, 0)
$053f92 :             dbra      d2,$53f62                  1.25% (3345192, 14304704, 0)
$053f96 :             movea.l   $ffe0(a6),a5               0.06% (164773, 1979028, 0)
DSP side:

Code: Select all

Used cycles:
  35.38%                  1683464532                       command_base
  28.39%  30.17%  30.17%  135076456014357997781435799778   R_DoColumnPerspCorrect
  10.26%  10.26%  10.26%   488014668 488014668 488014668   VPRenderSpanQuickMip
   4.72%                   224793024                       R_VPLoadTexture
   3.91%                   186283314                       ALGO_P_CrossBSPNode
   2.12%                   100933040                       R_DoColumnTextureUV
   2.11%   2.11%   2.11%   100378774 100378774 100378774   extract_subvisplane
   1.80%   1.80%   1.80%    85565356  85565356  85565356   VPRenderSpanWarp
   1.51%                    71730022                       P_CrossSubsector_body
   1.19%   1.36%  26.05%    56741114  647813221239476156   AddMidWall
   0.94%                    44577754                       R_ViewTestAddLine
   0.82%                    39203548                       R_VPRenderPlane
   0.70%   0.84%   3.83%    33341938  39864486 182313884   AddLowerWall
   0.69%                    32842868                       project_node
   0.54%   0.64%   3.66%    25560078  30535764 174079676   AddUpperWall
   0.52%   0.52%   0.52%    24822376  24828764  24828764   InterceptVectorsUF
Only a bit over 1/3 of DSP polling at command_base although full run includes also thinking parts.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:I will try to look into Boom code at some point and port at least some of the version difference stuff to BM.
That would be great :)

BTW I checked in some changes which reduce gametick cost further, when (TIMEBASE_CONTROL >= 4). Obviously it won't auto-profile but I suppose just seeing that the gametick overhead is very low on average, and manageable on 'worst frames' then it would be good news. I haven't checked gametick performance in these modes recently as I'm still zipping around the code fixing and finishing random things.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

There were also a few resource reads during the timedemo gameplay, for:
- MISLA5
- MISLB0
- MISLC0
- MISLD0
- MISLA6A4
- POL5A0

And a small texture cache read:

Code: Select all

- 0x51a74: display_engine (return = 0x4a79e)
- 0x54afa: R_DrawTransparentSurfaces (return = 0x51c42)
- 0x4ed44: cache_resource (return = 0x54dee)
- 0x512f8: D_CacheCtor_Sprite (return = 0x4ee1e)
- 0x50486: D_TextureCacheIn (return = 0x51312)
GEMDOS 0x3F Fread(66, 4, 0x1d0074)
GEMDOS 0x3F Fread(66, 1140, 0x974fd0)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: NMip0 wasn't even close to this high on Ultimate Doom timedemo profile, its heaviest part in this profile looks like this:
all of the R_AdvanceSurface_??? code paths are optimized, but they are selected based on complex sets of conditions which are influenced by the map, wall heights, texture sizes and so on. One path might appear to dominate suddenly because the scenery dictates it. This doesn't mean there is a problem, just that total work is more or less concentrated. If you add up the time in all R_AdvanceSurface_ paths you'll get the total work spent on drawing walls.

For example if you have a room where one texture is used everywhere and it is not a tiled type, it will use the non-tiled (NMip?) paths for drawing all of the walls. So that path group will register much higher because the work concentrates there. But the same amount of work is being done (in fact, when it works well, less work is being done in total since the paths are optimized for those cases)

Likewise if most of the action takes place in a cramped area, then the 'top' mip paths (?Mip0) will be used more often, and work will concentrate in those. If the scenery is distant, the work divides into more paths for different distances, and one path is less likely to dominate.

The most extreme case is facing one wall. Then you'll get all the wall drawing work in a single path and it will shoot to the top of the profiling list and look very suspicious :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:There were also a few resource reads during the timedemo gameplay, for:
- MISLA5
- MISLB0
- MISLC0
- MISLD0
- MISLA6A4
- POL5A0
Actually I think all of the weapon projectiles and some other dynamics are not precached. That will include the blue plasma gun sprites and the green BFG sprite, rocket and some others. Should be easy to add those to the precache list. I'll make a note to do it for next checkin.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Worst shareware Doom I timedemo frame profile information...

Rendering part, CPU side:

Code: Select all

Time spent in profile = 0.21407s.
...
Executed instructions:
  32.25%                      136062                       R_SpriteColumnShader_Masked2
  12.52%                       52815                       R_AdvanceSurface_NMip0
  11.60%                       48922                       R_DrawTSurface_Masked1
  11.58%                       48846                       R_VisPlaneShaderQuickMip
   7.27%   7.27%   7.27%       30663     30683     30683   _BM_A_Mux3x2
   4.79%                       20220                       R_AdvanceSurface_NMip1
   2.52%                       10632                       R_StackTransparentSurface
   2.36%                        9947                       R_AdvanceSurface_NMip2
   2.30%   2.31%   3.18%        9708      9758     13435   stream_texture
   1.75%   1.76%   1.76%        7385      7425      7425   stack_visplane_area
   1.63%                        6870                       R_BSPHyperPlane
   1.24%                        5228                       R_AddLine_loop
   1.23%   1.24%   1.35%        5203      5211      5715   R_AddSpriteSpans
   0.99%   0.99%   0.99%        4176      4183      4183   R_SetSubSectorLuma
...
Instruction cache misses:
  20.45%  20.49%  22.14%        3109      3115      3366   R_AddSpriteSpans
   7.23%                        1099                       R_SpriteColumnShader_Masked2
   5.34%   5.43%   6.45%         812       826       981   R_ViewTestSpriteLines
   4.51%                         686                       build_ssector
   4.10%                         624                       R_AddLine_loop
   3.75%                         570                       R_DrawTSurface_Masked1
   3.60%   3.60%  19.49%         548       548      2964   R_FlushDeferredSurfaces
   3.08%   3.16%   3.16%         469       480       480   _BM_A_Mux3x2
   2.79%                         425                       R_StackTransparentSurface
   2.74%   2.74%  25.47%         417       417      3873   R_SubSectorTryFlush
   2.57%   2.57%   2.57%         391       391       391   cache_resource
   2.34%   2.34%   5.50%         356       356       836   _subframe_block
   2.21%                         336                       R_AddOverlappingSprites
   2.20%                         335                       R_BSPHyperPlane
   2.11%   2.13%   2.13%         321       324       324   add_wall_segment
   1.99%                         302                       R_AdvanceSurface_NMip1
   1.86%                         283                       init_font
DSP side:

Code: Select all

Used cycles:
  46.32%                     3181764                       command_base
  17.01%  18.07%  18.07%     1168074   1241024   1241024   R_DoColumnPerspCorrect
  14.39%  14.39%  14.39%      988656    988656    988656   VPRenderSpanQuickMip
   5.25%                      360294                       R_VPLoadTexture
   3.94%   3.94%   3.94%      270564    270564    270564   extract_subvisplane
   2.28%                      156508                       R_DoColumnTextureUV
   1.64%                      112596                       R_VPRenderPlane
   1.16%                       79760                       R_ViewTestSpriteLine
   1.05%                       72216                       R_ViewTestAddLine
   0.99%   1.12%  18.43%       68010     77182   1266110   AddMidWall
   0.96%                       65960                       project_node
   0.76%   1.04%   3.07%       51930     71180    211076   AddTransWall
   0.57%                       38922                       R_CheckBBoxPair
   0.52%   1.66%   1.66%       35826    113836    113836   R_DoColumnConstantClone
Thinking part, CPU side:

Code: Select all

Time spent in profile = 0.14945s.
...
Executed instructions:
  13.80%  13.84%  74.44%       33202     33285    179093   _P_RunThinkers
  13.01%  13.18%  14.76%       31301     31713     35504   _BM_P_CrossBSPNode
  12.33%  12.36%  29.45%       29665     29727     70858   _P_CheckPosition
  11.06%  11.09%  19.36%       26602     26678     46576   _R_DrawVisSprite
  10.52%  10.52%  10.52%       25306     25320     25320   _R_PointInSubsector
   8.50%   8.51%   8.51%       20442     20462     20462   _BM_A_Mux3x2
   6.67%   6.68%   6.68%       16040     16061     16061   _R_DrawColumn
   4.18%   4.20%   6.07%       10046     10095     14598   _PIT_CheckLine
   2.09%   2.10%   3.70%        5037      5061      8906   _PIT_CheckThing
   1.95%   1.95%  16.24%        4690      4690     39061   _P_LookForPlayers
   1.91%   1.91%  16.67%        4588      4606     40110   _BM_P_CheckSight
   1.12%   1.12%  34.04%        2692      2699     81887   _P_SetMobjState
   1.01%   1.01%  23.78%        2441      2441     57202   _P_TryMove
   0.69%   0.69%  12.15%        1666      1670     29230   _P_ChangeSector
   0.67%   0.67%   3.86%        1601      1601      9294   _P_SetThingPosition
   0.64%   0.64%   0.64%        1550      1550      1550   print_char
   0.62%   0.62%  12.82%        1487      1487     30839   _P_XYMovement
   0.56%   0.56%   0.56%        1337      1337      1337   _P_UnsetThingPosition
   0.52%   0.52%   0.52%        1255      1255      1255   _P_UpdateSpecials
   0.50%   0.50%   0.54%        1206      1206      1289   _R_ClearPlanes
   0.50%   0.50%   9.01%        1205      1205     21667   _subframe_block
...
Instruction cache misses:
  15.28%  15.33%  32.48%        8968      8997     19065   _P_CheckPosition
  11.11%  11.17%  78.51%        6519      6559     46081   _P_RunThinkers
  10.18%  10.22%  11.20%        5976      5999      6572   _PIT_CheckLine
   9.03%   9.09%  14.98%        5300      5337      8795   _R_DrawVisSprite
   5.88%   6.39%   6.61%        3451      3749      3878   _BM_P_CrossBSPNode
   5.56%   5.58%   5.58%        3265      3274      3274   _R_DrawColumn
   4.99%   5.00%   5.00%        2928      2934      2934   _R_PointInSubsector
   4.78%   4.79%  11.40%        2803      2812      6690   _BM_P_CheckSight
   2.77%   2.78%  36.86%        1628      1631     21637   _P_SetMobjState
   2.75%   2.75%  13.24%        1612      1612      7770   _P_LookForPlayers
   2.71%   2.71%  27.97%        1591      1591     16418   _P_TryMove
   2.34%   2.35%   2.62%        1371      1382      1537   _PIT_CheckThing
   1.77%   1.77%   3.43%        1039      1039      2014   _P_SetThingPosition
   1.31%   1.31%   1.31%         770       770       770   _P_UnsetThingPosition
   1.27%   1.27%  15.15%         743       743      8893   _P_XYMovement
...
DSP cycles:
  84.53%                     4053306                       command_base
   9.65%                      462814                       ALGO_P_CrossBSPNode
   2.93%                      140672                       P_CrossSubsector_body
   1.15%   1.15%   1.15%       55056     55068     55068   InterceptVectorsUF
   1.08%                       51636                       Divs48_Real
Rendering part could be quicker than in other worst profiles because statusbar rendering is disabled.

IMHO most interestingly thing here is worst thinking frame being much faster than worst rendering frame.

Heaviest part in P_CheckPosition() is following:

Code: Select all

$03132c :             cmpi.w    #$ffff,(a2)                0.06% (141, 1456, 82)
$031330 :             beq.s     $31366                     0.06% (141, 564, 0)
$031332 :             move.w    (a2),d0                    0.31% (748, 6452, 187)
$031334 :             muls.w    #$3e,d0                    0.31% (748, 24872, 117)
$031338 :             movea.l   $2b1bd4,a0                 0.31% (748, 9036, 0)
$03133e :             adda.l    d0,a0                      0.31% (748, 2992, 117)
$031340 :             move.w    $8e858,d0                  0.31% (748, 5984, 0)
$031346 :             cmp.w     $36(a0),d0                 0.31% (748, 5984, 0)
$03134a :             beq.s     $3135e                     0.31% (748, 3468, 117)
$03134c :             move.w    d0,$36(a0)                 0.25% (605, 5020, 80)
$031350 :             move.l    a0,-(sp)                   0.25% (605, 7260, 0)
$031352 :             bsr       $324e8                     0.25% (605, 4840, 0)
$031356 :             addq.l    #4,sp                      0.25% (605, 2420, 227)
$031358 :             tst.l     d0                         0.25% (605, 2420, 0)
$03135a :             beq       $312a6                     0.25% (605, 4840, 0)
$03135e :             addq.l    #2,a2                      0.31% (736, 2936, 116)
$031360 :             cmpi.w    #$ffff,(a2)                0.31% (736, 6352, 116)
$031364 :             bne.s     $31332                     0.31% (736, 5428, 0)
$031366 :             addq.l    #1,d2                      0.05% (129, 516, 71)
But it's a long function and above is only small part of it. Please see also the attached callgraph.
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Actually I think all of the weapon projectiles and some other dynamics are not precached. That will include the blue plasma gun sprites and the green BFG sprite, rocket and some others. Should be easy to add those to the precache list. I'll make a note to do it for next checkin.
I'm not sure they all need to be pre-cached, how slow that stuff is on real Falcon nowadays?

Stuff that needs to be pre-cached are things that might need to be loaded in a middle of hectic fighting. Heavier weaponry is often found from some "cache" :) and action mostly happens only after picking the thing up and triggering something by it (wall opens to reveal a mass of slumbering monsters surprised by the weapons thief).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: I'm not sure they all need to be pre-cached, how slow that stuff is on real Falcon nowadays?
If the item has never been seen on that system, it pauses for 1 second or so per item to do all the mipmapping, postmap generation stuff and writing out the local cache file to disk.

If that's already done and it tries to load the local version instead, I don't notice any delay from that. But my test machine is CFLASH based so it may be much slower from IDE. I don't know.

The main thing is to get the WAD resources converted into the local cache before playing, otherwise nasty pauses do occur.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Rendering part could be quicker than in other worst profiles because statusbar rendering is disabled.

IMHO most interestingly thing here is worst thinking frame being much faster than worst rendering frame.

Heaviest part in P_CheckPosition() is following:

But it's a long function and above is only small part of it. Please see also the attached callgraph.
Ok I'll take a look at P_CheckPosition(). I think I did something with it already but not much, perhaps just inlining work.

(Will also look at the CG).

I might take another look at sprites to make sure the right path is chosen for different sized sprites. i.e. small sprites use pixel testing with cheap setup. Big sprites use 'datacached posts', really big sprites use modified version of posts for fewer memory fetches. Currently always using the middle case which is the best overall case but specifics for the other two will help in some cases (tons of sprites at distance, or one really close).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

dml wrote: (Will also look at the CG).
The CG is quite tidy and clear. However most of the time is in those leaves, and because the whole tree is trying to run several times each frame, on hundreds of objects.

Fortunately the changes which have been made with TIMEBASE_CONTROL>=4 have regulated that down significantly.


The last changes make the tick rate for each object a function of it's type, distance and current state. So things falling under gravity or exploding can tick at the standard rate but they drop back to 'managed' mode when they complete their cycle. In managed mode some types of object only tick rapidly near the player (e,g, pickups) with rapid falloff with distance, while others tick a little faster near the player but overall at a low frequency (corpses), so they still respond to moving floors etc but don't steal much time. Enemies have a high tickrate near the player and a medium falloff with distance. etc. etc.

This keeps the thinking cycles really low during play.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:
Eero Tamminen wrote:With 1.5MB alloc, overlay WADs work.

So far "mtfactor.wad" & "polygon.wad" are the only Doom II wads I got to load without the "fseek to insane value" problem.
Can you send me one or two of the ones which don't work? Smaller ones should be fine. But they should work with vanilla Doom II either under DOS or compiled from the Linux sources - not some fancy port as it's impossible to tell if they use unusual features.
The two working ones are 0.5MB and 1.1MB, all the other WADs are larger. I've earlier sent you one of them, the phobos one, do you still have that?

dml wrote:
Eero Tamminen wrote: "polygon.wad" looks better in BM, but it has lower FPS and in some places it's really slow without any visible reason. I think I need to profile it a bit later.
I found that the original Doom levels were very carefully made to use sectors efficiently. Many of the PWAD overlays were made much later, by people with faster boxes and a less well formed understanding of level efficiency. So some of those maps look nice, while saturating the machine.
Pauses were second(s) in length so I think there's something else going on than just heavy scenery. I'll need to play it couple of times first to make sure it's not just caching. If the slowdown is still there, then it needs profiling.

dml wrote:
Eero Tamminen wrote: Of the Doom I wads, "teutic.wad" and "eternity.wad" work well and e.g. last one is IMHO pretty nice.

For some reason other Doom I pwads got stuck at sound init. CPU was looping on some unnamed code coming after "init_font" (as it's unnamed, I think it could be code that sometimes shows in profiles assigned under "init_font").
Hard to say what that might be. If you keep a note of any which don't work but you are fairly sure are vanilla PWADs I'll look into it though. Should be possible to get all the 'normal' ones working and maybe a subset of the funny Boom/ZDoom ones if they dont depend on special features.
"infinity.wad" seems to be also working.

"galaxia.wad" and "trinity.wad" work in DOS Doom I, but not in BM. These WADs have their own sounds (at least music) unlike the other WADs, which might explain why it breaks with BM...?

IMHO "galaxia.wad" is pretty crappy. "trinity.wad" has really huge open area so it might be good test case (note that it has some missing texture patches both under Dos Doom & PrBoom).

The WADs are from here: http://www.doomworld.com/10years/bestwads/1994.php (get the "Trinity collage" one)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

bullis1 wrote:I'm just chiming in to say (along with everybody else) that the video was jaw-dropping.
Cheers, much appreciated.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

I took a profile of Doom I timedemo running through all the 4 levels, not just the first one, including the level loads. This was from second round of running it, so anything that gets cached to disk has already been done.

CPU side looks like this:

Code: Select all

Time spent in profile = 708.77511s.
...
Executed instructions:
  13.24%                   171617365                       R_AdvanceSurface_NMip0
   6.08%                    78774673                       R_VisPlaneShaderQuickMip
   5.66%   5.67%  18.83%    73410221  73533386 244089954   _P_RunThinkers
   5.64%                    73156447                       R_SpriteColumnShader_Masked2
   5.57%                    72224212                       R_AdvanceSurface_TMip0
   4.35%   4.40%   4.62%    56355301  57055357  59841262   _BM_P_CrossBSPNode
   4.24%                    54910451                       R_AdvanceSurface_NMip1
   4.06%   4.07%   4.13%    52663768  52732863  53580083   D_FlatMipGen_8_16
   4.06%   4.06%   4.16%    52573132  52676804  53958137   _V_DrawPatch
   3.76%                    48694218                       R_AdvanceSurface_TMip1
   2.10%   2.11%   2.24%    27237412  27326936  29091262   stream_texture
   1.95%                    25214320                       R_AdvanceSurface_TMip2
   1.90%   1.90%   1.98%    24577908  24623328  25707812   _R_DrawColumn
   1.84%   1.84%   1.92%    23830368  23855678  24820578   _R_PointInSubsector
   1.57%   1.58%   3.78%    20405623  20451388  49030735   _P_CheckPosition
   1.49%   1.49%   1.49%    19353118  19366022  19366022   _BM_A_Mux1x2
   1.37%                    17806731                       R_AdvanceSurface_NMip2
   1.32%   1.32%   1.39%    17100531  17138176  17977808   R_ViewTestSpriteLines
   1.31%   1.31%   1.31%    16997874  17003637  17003637   _BM_A_Mux3x2
   1.29%                    16773021                       R_BSPHyperPlane
   1.27%   1.27%   3.32%    16469823  16505130  43070262   _R_DrawVisSprite
   0.99%   0.99%   1.01%    12820899  12837582  13071304   strcmp_8
   0.97%   0.97%   1.01%    12612910  12635799  13123215   _P_GroupLines
   0.90%                    11682784                       R_VisPlaneShaderWarp
   0.87%   0.87%   5.95%    11239374  11259859  77157991   _P_LookForPlayers
   0.78%   0.78%   0.84%    10053302  10072047  10881352   R_AddSpriteSpans
   0.77%   0.77%   0.81%     9945066   9971979  10507117   _P_UpdateSpecials
   0.74%   0.74%   0.77%     9631112   9645662   9957487   init_stategroups
   0.73%   0.73%   1.35%     9405266   9424798  17476312   D_FlatMipRemap_16_8
   0.72%                     9340133                       R_AddLine_loop
   0.72%                     9285011                       R_VisPlaneSkyShader
   0.69%   0.69%   0.71%     8930304   8954050   9265094   correct_element
   0.68%   0.68%   5.35%     8764552   8804749  69295153   _BM_P_CheckSight
   0.64%                     8349144                       R_AdvanceSurface_TMip3
   0.62%   0.62%   0.62%     7989948   7994308   7994308   _BM_A_Mux2x2
   0.60%                     7823401                       R_StackTransparentSurface
   0.60%                     7817926                       R_DrawSurface_NoTileNoMip
   0.59%   0.59%   0.64%     7668804   7682559   8300621   D_TextureRemapInit
   0.59%   0.59%   0.61%     7582195   7593615   7848167   _R_TextureNumForName
   0.58%   0.58%   0.60%     7515666   7527095   7783787   R_SetSubSectorLuma
   0.50%   0.51%   0.55%     6540692   6672706   7084305   stack_visplane_area
...
Instruction cache misses:
   9.40%   9.45%  47.82%    11536483  11601409  58686746   _P_RunThinkers
   5.76%   5.77%  11.74%     7063404   7085177  14402471   _P_CheckPosition
   4.92%   5.35%   5.47%     6039494   6566120   6717978   _BM_P_CrossBSPNode
   4.35%   4.36%   9.87%     5338691   5353465  12108087   _BM_P_CheckSight
   4.24%   4.25%   4.40%     5201549   5210478   5403732   R_AddSpriteSpans
   3.51%   3.52%   3.67%     4312878   4323123   4499808   _PIT_CheckLine
   3.48%   3.50%   5.45%     4273997   4290026   6691838   _R_DrawVisSprite
   3.11%   3.11%   5.93%     3816888   3820683   7271711   hrsc_locate_q
   2.96%   2.97%   3.01%     3631793   3650604   3698372   R_ViewTestSpriteLines
   2.95%   2.96%  12.86%     3624952   3635426  15779426   _P_LookForPlayers
   2.83%   2.83%  31.63%     3471835   3478366  38821747   _P_SetMobjState
   2.82%   2.82%   2.84%     3458202   3464913   3482927   strcmp_8
   2.08%   2.09%   2.13%     2547474   2563294   2611184   _R_PointInSubsector
   1.77%   1.79%   1.84%     2170838   2196336   2258589   _R_DrawColumn
   1.55%   1.55%  15.31%     1898250   1904655  18795481   _P_TryMove
   1.42%                     1745091                       build_ssector
   1.25%                     1539016                       R_AddLine_loop
   1.25%   1.25%  14.77%     1537551   1540173  18123434   _A_Look
   1.16%   1.16%   2.16%     1426750   1428178   2647511   D_FlatCLUTGen_32x32
   1.14%                     1394429                       R_SpriteColumnShader_Masked2
   1.08%   1.08%   1.92%     1325004   1327975   2361690   _P_SetThingPosition
   1.07%   1.07%  13.67%     1312201   1314226  16780944   _A_Chase
   0.98%   0.99%   1.01%     1205616   1216262   1240326   correct_element
   0.93%   0.93%   6.61%     1136941   1141009   8117646   R_FlushDeferredSurfaces
   0.91%   0.92%   0.96%     1120123   1126028   1172402   _PIT_CheckThing
   0.85%                     1046205                       R_AddOverlappingSprites
   0.84%   0.84%   1.50%     1028206   1029265   1834820   _subframe_block
   0.83%   0.83%   8.48%     1014894   1015853  10405466   R_SubSectorTryFlush
   0.80%   0.80%   0.81%      975740    979697    990967   add_wall_segment
   0.79%                      965267                       R_BSPHyperPlane
   0.77%   0.77%   0.84%      941801    944490   1036643   _G_BuildTiccmd
   0.75%   0.75%  11.90%      922237    924824  14599107   _P_Move
   0.74%   0.75%  10.62%      914099    917133  13028480   _P_CheckSight
DSP side:

Code: Select all

Used cycles:
  47.11%                  10712705622                       command_base
  23.65%  24.98%  24.98%  537801158856796779965679677996   R_DoColumnPerspCorrect
   7.66%   7.66%   7.66%  174232593017423259301742325930   VPRenderSpanQuickMip
   3.77%                   857714886                       R_VPLoadTexture
   3.33%                   758388322                       ALGO_P_CrossBSPNode
   1.75%   1.75%   1.75%   397678290 397678290 397678290   extract_subvisplane
   1.60%                   364192266                       R_DoColumnTextureUV
   1.20%                   273246874                       P_CrossSubsector_body
   1.12%   1.12%   1.12%   254464186 254464186 254464186   VPRenderSpanWarp
   0.96%                   219362446                       R_ViewTestAddLine
   0.89%   1.03%  20.95%   202103492 2342283344763524526   AddMidWall
   0.70%                   158056956                       project_node
   0.68%                   154335382                       R_VPRenderPlane
   0.55%                   125310132                       R_VPRenderSky
   0.53%   0.64%   2.76%   121226142 145602512 627900908   AddLowerWall
It looks fairly similar to the first level, but hopefully this is now much more representative sample.

It might make sense to take (manually) a profile also from just the last level demo as it's slowest (first one of the four is second slowest). Attached are couple of teaser Hatari screenshots from that last level (converted to 8-bit to decrease their size).
grab0001.png
grab0002.png
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:I took a profile of Doom I timedemo running through all the 4 levels, not just the first one, including the level loads. This was from second round of running it, so anything that gets cached to disk has already been done.
The floor textures don't get cached to disk yet, so this still shows up when it should not be seen:

4.06% 4.07% 4.13% 52663768 52732863 53580083 D_FlatMipGen_8_16

...and this one will be largely gone (or at least, absorbed into something else) after the next changes:

4.06% 4.06% 4.16% 52573132 52676804 53958137 _V_DrawPatch
Eero Tamminen wrote: It might make sense to take (manually) a profile also from just the last level demo as it's slowest (first one of the four is second slowest). Attached are couple of teaser Hatari screenshots from that last level (converted to 8-bit to decrease their size).
This level is quite problematic. It's sort of a worst case 'no occlusion' scenario, where the whole map is visible from the startpoint - it consists mainly of a large number of sectors with open joins and height differences. It averages 1 wall per BSP subsector. Very inefficient. AIs will also spend a lot of time crossing edges on this map.

I remember playing this one on my old 486 when I bought the UDoom CD around '98ish. It was slow on that too :) I think it's out of scope for this port but its useful for optimizing some things. I used it to work on the deferred rendering mechanism for wall surfaces (buffer as much as possible until something forces a commit, possibly several subsectors at a time)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

BTW in order to be able to play as many maps as possible, and/or to get the highest playable resolution possible, there will be detail modes which exclude textures from walls or floors, using an averaged texture colour for the fill. There will still be depth cue and lighting but without textures. This increases fillrate but also removes the cost of transmitting textures to the DSP, which can be significant if the map allows many different floor textures to be seen at once.

By default everything will be on, and nothing lost from the original (some areas improved where possible, e.g. use of truecolour) but some stuff will be configurable for the sake of flexibility and more demanding maps which do exist.
User avatar
dma
Atari God
Atari God
Posts: 1223
Joined: Wed Nov 20, 2002 11:22 pm
Location: France

Re: Bad Mood : Falcon030 'Doom'

Post by dma »

Really happy to read that, i was expecting those sort of options to be included for playability adjustment.

Also running a non textured DOOM will give it a "virtual reality 0.1 / Tron" feeling, which can be as cool as fully textured mode. :)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:BTW in order to be able to play as many maps as possible, and/or to get the highest playable resolution possible, there will be detail modes which exclude textures from walls or floors, using an averaged texture colour for the fill. There will still be depth cue and lighting but without textures.
That sounds great!

If you have that already implemented (by without end-user option for enabling it), could you provide a screenshot of what it looks like?

(I didn't see it mentioned in the README.known-issues performance sections.)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

I'll be very busy this weekend but should be able to post something on the 'untextured' approach fairly soon.

There is no switch for this yet but some tests have been done and most of the stuff needed is already present in some form. It just needs glue and an optimized version of the two main paths for walls and floors.


I recently got weapon overlays drawing via BMEngine instead of the C Doom game code so that also provides a small speed increase. This change caused a small ordering bug which will need fixed before I can move on.
User avatar
dma
Atari God
Atari God
Posts: 1223
Joined: Wed Nov 20, 2002 11:22 pm
Location: France

Re: Bad Mood : Falcon030 'Doom'

Post by dma »

By the way, could these non textured modes, or/and maybe some other simplifications, make official DOOM wads run on 4mb Falcon?
Perhaps some auto-lowered sprites resolution could be doable as well and make some good on this?

All this would probably require some kind of precalc on first run, or something.

Just some thoughts, i don't imagine this easy to do at all.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

dma wrote:By the way, could these non textured modes, or/and maybe some other simplifications, make official DOOM wads run on 4mb Falcon?
Perhaps some auto-lowered sprites resolution could be doable as well and make some good on this?

All this would probably require some kind of precalc on first run, or something.

Just some thoughts, i don't imagine this easy to do at all.
It certainly makes the prospect more realistic - however a number of other things still get in the way (doom itself wants about 2-3mb still and BMengine uses all of what's left). I'll look at those once it's all working properly in 14mb.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Tonight I got a bit of time to do a little work on the 'flat filled' approach. Mostly changes to the texture format which has to generate and store the colour tables for this. The rendering side is just a temporary hack, only a bit faster than the textured version. I'll do that bit later. For now, this is how it looks.
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

It's probably a good thing I did this now, because it showed up a bug in the floor drawing which killed the depth cue lighting effect. It probably got broken in the last couple of weeks, but it means both videos were probably recorded without depth cue in the floors. It's not that noticeable but still pretty annoying.

Fixing it hasn't affected speed, it was just using the wrong table entry due to a small error while reworking some DSP code. :-z
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Tonight I got a bit of time to do a little work on the 'flat filled' approach. Mostly changes to the texture format which has to generate and store the colour tables for this. The rendering side is just a temporary hack, only a bit faster than the textured version. I'll do that bit later. For now, this is how it looks.
Thanks, I think it looks great!

It's easy to see the walls and corners etc. Hopefully this mode gets also rid of the flickering lights and uses instead some averaged lighting value in areas which had those headache inducing special effects. ;-)

Only thing which might be slightly distracting style-wise is the wall with diagonal holes in it (in the first picture) because it's still textured unlike other walls. Would it (eventually) help performance if those (partially) see-through walls were also flat shaded?
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: It's easy to see the walls and corners etc. Hopefully this mode gets also rid of the flickering lights and uses instead some averaged lighting value in areas which had those headache inducing special effects. ;-)
Nope, the lights still strobe and flicker :) But it's easy enough to control that stuff from a config file. I would say that doing so by default would interfere with the game because the sector lighting is used to turn lights on and off and attract attention to things for gameplay reasons. (Apart from creating atmosphere, plus inducing headaches in some cases! esp. one underground bit in e1m2 which is just a strobe-fest in darkness).
Eero Tamminen wrote: Only thing which might be slightly distracting style-wise is the wall with diagonal holes in it (in the first picture) because it's still textured unlike other walls. Would it (eventually) help performance if those (partially) see-through walls were also flat shaded?
Since I haven't written a proper renderer for this mode yet (it's just a hack - mockup), I didn't bother converting all the shader cases. So that would also be drawn flatshaded in the end.

There is in fact a much bigger gameplay problem with flatshading everything - finding the damn switches! :)

Might be wise to detect any multi-patch textures with a patch 'floating' in the middle of it and cheat - render it as a texture but flatshade the patches in the texture source except for the part that is the switch. Tricky and messy but I can't think of anything else sensible.
User avatar
dma
Atari God
Atari God
Posts: 1223
Joined: Wed Nov 20, 2002 11:22 pm
Location: France

Re: Bad Mood : Falcon030 'Doom'

Post by dma »

dml wrote:Tonight I got a bit of time to do a little work on the 'flat filled' approach. Mostly changes to the texture format which has to generate and store the colour tables for this. The rendering side is just a temporary hack, only a bit faster than the textured version. I'll do that bit later. For now, this is how it looks.
Wow! :D I love that flat rendering, and i can't imagine how great it must look when animated.
This is an enhancement to me, not a downgrade option. :wink:

Return to “680x0”