Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

dma wrote: Wow! :D I love that flat rendering, and i can't imagine how great it must look when animated.
This is an enhancement to me, not a downgrade option. :wink:
Well it does look a bit more retro. Also more like an ST game. Unfortunately it would be a massive job to *fully* optimize the game for flatshading because 98% of the code is still working on the assumption that textures and perspective are present.

It's wouldn't make sense to do that anyway since it would mean losing textured mode.

Anyway I have committed a test version of this for Eero to play with :) It can only be turned on/off using a build flag.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

I've now tried it and while I think it looks nice, some things like the poison liquids don't look quite as damaging as they do with textures... I.e. game is missing some visual cues with flat shading. But IMHO second timedemo level (one with large number of boxes in it) looks better with flat shading.

Btw. Do you have some format version number in the cached data & check for it? BM looked pretty weird and crashy with the earlier cached data, before I removed that obsolete data. :-)

I think you can also enable the statusbar again. The VILE path warnings are still there?
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:I've now tried it and while I think it looks nice, some things like the poison liquids don't look quite as damaging as they do with textures... I.e. game is missing some visual cues with flat shading. But IMHO second timedemo level (one with large number of boxes in it) looks better with flat shading.
Finding secrets, switches and other cues will be a problem yes. TBH I don't know how much effort I want to put into the flat shaded version because it's not clear where the work would actually cut off. But I'll try to deal with the most obvious problems at least.
Eero Tamminen wrote: Btw. Do you have some format version number in the cached data & check for it? BM looked pretty weird and crashy with the earlier cached data, before I removed that obsolete data. :-)
No, it will go in soon though. And yes I ended up trying to debug a non-bug because of that :)

It also needs an extra level of directory specialization, corresponding to the WAD. Probably the PWAD first, then IWAD as fallback (since the IWAD version is normally implied by the PWAD). This could cause a lot of files to be created! But the alternative is resource collisions of unknown consequence.
Eero Tamminen wrote: I think you can also enable the statusbar again. The VILE path warnings are still there?
The real reason the status bar is disabled is because I haven't implemented a double-buffered drawing solution for it yet. i.e. It wasn't related to that bug you noticed (in fact I never hit the bug because my SB has been off for a while!).

With the current SB hack everything gets drawn twice per frame at the moment which is not ideal. It also can't cope with anything but the default 320 pixel resolution. I'll be dealing with it next.

BTW I just added a special fastpath for more distant sprites, and implemented a distance (screenspace pixel area) cull on sprites to save some CPU time. I don't know what impact it has on demo profiling but it should certainly be trying to draw fewer objects now, and with less setup overhead for small ones.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

It's not terribly useful but you can also drop the colour count in the textures to something silly. e.g. set them to 4 or 16 and delete the BMC cache folder.

bldcfg_sky_remap_maxcols =
bldcfg_texture_remap_maxcols =
bldcfg_cpuflat_remap_maxcols =
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Here's doom II timedemo (first level) profile with flatshading, before your distant sprites fastpath...

CPU side:

Code: Select all

Time spent in profile = 70.63560s.
...
Executed instructions:
  21.69%                    28137582                       R_AdvanceSurface_flatshade
   8.17%   8.25%   8.75%    10604029  10709126  11359076   _BM_P_CrossBSPNode
   7.30%                     9474145                       R_SpriteColumnShader_Masked2
   6.89%                     8945691                       R_VisPlaneFlatShader
   6.36%   6.37%   6.62%     8256622   8264924   8586349   _R_PointInSubsector
   5.83%   5.84%  35.15%     7567103   7579267  45612020   _P_RunThinkers
   4.46%   4.47%  13.75%     5784277   5799400  17841045   _P_CheckPosition
   3.54%   3.55%   3.69%     4598330   4605286   4785966   init_stategroups
   2.80%                     3632332                       R_BSPHyperPlane
   2.49%   2.49%   2.49%     3225751   3227551   3227551   _BM_A_Mux3x2
   2.23%   2.23%   2.42%     2892354   2899070   3135074   _PIT_CheckThing
   1.55%   1.55%   1.65%     2010903   2012014   2142398   R_ViewTestSpriteLines
   1.38%   1.39%   1.50%     1796108   1799235   1952345   _P_UpdateSpecials
   1.37%                     1775288                       R_AddLine_loop
   1.33%   1.33%   1.45%     1725257   1728969   1879838   R_AddSpriteSpans
   1.23%   1.23%   1.23%     1592692   1593732   1593732   _BM_A_Mux2x2
   1.22%                     1578464                       R_StackTransparentSurface
   1.20%   1.23%   1.33%     1556192   1592782   1724284   stack_visplane_area
   1.00%   1.01%   1.05%     1303241   1305669   1363606   R_SetSubSectorLuma
   0.93%                     1209173                       R_DrawSurface_flatshade
   0.92%   0.92%   1.07%     1196467   1200162   1394448   _PIT_CheckLine
   0.87%   0.87%   0.87%     1133344   1134064   1134064   _BM_A_Mux1x2
   0.85%                     1099162                       build_ssector
   0.82%   0.82%   2.09%     1068312   1061738   2707701 * get_ssector
   0.80%   0.80%   9.62%     1041176   1044278  12480621   _BM_P_CheckSight
   0.73%   0.73%   9.03%      943806    945601  11716654   _P_LookForPlayers
   0.68%   0.68%   0.72%      884801    886478    931864   add_wall_segment
   0.60%   0.61%  14.56%      782750    785514  18887150   _P_Move
   0.55%   0.55%  15.86%      708808    713319  20580805   _P_TryMove
DSP side:

Code: Select all

Used cycles:
  51.61%                  1169584422                       command_base
  13.15%  14.97%  14.97%   298031440 339329002 339329002   R_DoColumnPerspCorrect
   6.35%                   143877608                       R_VPRenderFlat
   5.25%                   118908416                       ALGO_P_CrossBSPNode
   2.99%   2.99%   2.99%    67873266  67873266  67873266   extract_subvisplane
   2.96%                    67169814                       P_CrossSubsector_body
   2.41%                    54606832                       R_ViewTestAddLine
   2.37%                    53658514                       R_DoColumnTextureUV
   1.53%                    34704816                       project_node
   1.34%   1.63%   4.68%    30265242  36865978 106005680   AddLowerWall
   1.10%   1.29%  13.26%    25016872  29169880 300529572   AddMidWall
   1.01%                    22941384                       R_CheckBBoxPair
   0.97%   0.97%   0.97%    21873920  21884536  21884536   InterceptVectorsUF
   0.72%   0.89%   1.79%    16284102  20110148  40660668   AddUpperWall
   0.72%   0.00%   0.00%    16257946     78938     78938 * Divs48_Real
   0.57%   0.57%   0.57%    12841422  12841422  12841422   R_BufferSurface
All disk reads during gameplay seem to be D_TextureCacheIn() calls.


And worst frame, thinking part:

Code: Select all

Time spent in profile = 0.24291s.
...
Executed instructions:
  17.63%  17.78%  18.70%       71196     71821     75530   _BM_P_CrossBSPNode
  15.54%  15.55%  16.47%       62735     62812     66521   _R_PointInSubsector
   9.97%   9.99%  33.96%       40247     40352    137135   _P_CheckPosition
   9.05%   9.07%  74.62%       36553     36618    301332   _P_RunThinkers
   8.61%   8.62%   8.62%       34770     34790     34790   _BM_A_Mux3x2
   5.39%   5.41%  15.00%       21774     21867     60592   _P_PathTraverse
   4.94%   4.95%   5.89%       19934     19987     23788   _PIT_CheckThing
   3.85%   3.86%   3.86%       15559     15597     15597   _PIT_AddLineIntercepts_L
   3.04%   3.05%   3.05%       12295     12306     12306   init_stategroups
   2.10%   2.11%   3.03%        8470      8509     12256   _P_UpdateSpecials
   1.99%   2.00%   3.03%        8029      8078     12245   _PIT_CheckLine
   1.60%   1.59%  20.12%        6464      6418     81256 * _BM_P_CheckSight
   1.39%   1.39%  33.35%        5600      5600    134668   _P_Move
   1.33%   1.33%  16.72%        5360      5387     67537   _P_LookForPlayers
   1.23%   1.24%  38.08%        4987      4994    153778   _P_TryMove
   1.14%   1.14%   1.14%        4609      4609      4609   _P_PointOnDivlineSide
   1.12%   1.12%   2.26%        4514      4514      9123   _PIT_AddThingIntercepts
   0.87%   0.87%  57.67%        3497      3497    232878   _P_SetMobjState
   0.70%   0.70%   1.56%        2810      2824      6287   _PTR_ShootTraverse
   0.68%   0.68%  30.02%        2752      2752    121236   _P_NewChaseDir
   0.63%                        2561                       R_StackTransparentSurface

DSP side:

Used cycles:
  76.98%                     5999616                       command_base
   9.54%                      743748                       ALGO_P_CrossBSPNode
   6.23%                      485846                       P_CrossSubsector_body
   1.91%   1.91%   1.91%      148540    148652    148652   InterceptVectorsUF
   1.31%                      101896                       Divs48_Real
   1.30%                      101630                       ALGO_P_LineIntercept
   1.17%                       90894                       R_DoColumnTextureUV
   0.75%   0.75%   0.75%       58144     58144     58144   TestLineSegVectorBisection
What init_stategroups() does? Its cost is in these three lines (everything else in that routine is called only once):

Code: Select all

$05322c :             move.w    d1,(a0)                    1.01% (4096, 32880, 0)
$05322e :             addq.l    #8,a0                      1.01% (4096, 16384, 2)
$053230 :             dbra      d0,$5322c                  1.01% (4096, 32772, 1)
Rendering part:

Code: Select all

Time spent in profile = 0.18234s.
...
Executed instructions:
  40.79%                      149751                       R_SpriteColumnShader_Masked2
  19.33%                       70972                       R_AdvanceSurface_flatshade
   7.88%                       28928                       R_VisPlaneFlatShader
   6.63%   6.63%   6.63%       24339     24339     24339   _BM_A_Mux3x2
   3.50%                       12850                       R_BSPHyperPlane
   2.52%                        9253                       R_StackTransparentSurface
   2.41%   2.41%   2.54%        8865      8830      9313 * R_AddSpriteSpans
   1.84%                        6770                       R_AddLine_loop
   1.77%   1.80%   1.80%        6498      6616      6616   stack_visplane_area
   1.70%   1.71%   1.71%        6253      6260      6260   R_ViewTestSpriteLines
   1.10%                        4040                       build_ssector
   1.07%                        3933                       R_DrawSurface_flatshade
   1.07%   1.07%   2.76%        3914      3914     10129   get_ssector
   1.06%   1.07%   1.07%        3907      3914      3914   R_SetSubSectorLuma
   1.04%   1.04%   1.04%        3827      3827      3827   add_wall_segment
   0.58%   0.58%  27.32%        2127      2134    100289   R_FlushDeferredSurfaces

DSP side:

Used cycles:
  53.03%                     3102132                       command_base
  13.40%  15.48%  15.48%      783788    905616    905616   R_DoColumnPerspCorrect
   7.42%                      434260                       R_VPRenderFlat
   4.22%                      246676                       R_ViewTestAddLine
   3.72%   3.72%   3.72%      217682    217682    217682   extract_subvisplane
   3.20%                      187034                       R_DoColumnTextureUV
   2.18%                      127394                       project_node
   1.90%   2.23%   6.33%      111004    130608    370238   AddLowerWall
   1.35%                       79206                       R_CheckBBoxPair
   1.16%   1.41%   3.26%       67866     82316    190484   AddUpperWall
   1.10%   1.28%  12.12%       64408     75008    709218   AddMidWall
   0.88%   0.88%   0.88%       51482     51482     51482   R_BufferSurface
   0.75%   1.04%   3.11%       43960     61038    182044   AddTransWall
   0.55%   0.55%   1.21%       32340     32340     70848   R_SetupSurface
Slowest rendering frame is faster than earlier, and it seems to be more sprite than 3D rendering bound.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:BTW I just added a special fastpath for more distant sprites, and implemented a distance (screenspace pixel area) cull on sprites to save some CPU time. I don't know what impact it has on demo profiling but it should certainly be trying to draw fewer objects now, and with less setup overhead for small ones.
At least based on Hatari profile of Doom II timedemo, your last commit is actually very slightly more expensive than earlier:

Code: Select all

Time spent in profile = 70.71545s.
And in the worst frame profile, time for worst frame render also went very slightly up, but curiously, worst frame thinking part time went slightly down...
AnthonyJ
Captain Atari
Captain Atari
Posts: 165
Joined: Sat Jan 26, 2013 8:16 am

Re: Bad Mood : Falcon030 'Doom'

Post by AnthonyJ »

dml wrote: Well it does look a bit more retro. Also more like an ST game
Also, more like Dview and the early builds of BM, you mean ;)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: What init_stategroups() does? Its cost is in these three lines (everything else in that routine is called only once):

Code: Select all

$05322c :             move.w    d1,(a0)                    1.01% (4096, 32880, 0)
$05322e :             addq.l    #8,a0                      1.01% (4096, 16384, 2)
$053230 :             dbra      d0,$5322c                  1.01% (4096, 32772, 1)
[/quote]

It's important for rendering but implementation is quick'n'dirty. It probably got expensive when I fixed a couple of other bugs and raised the resource cache index limit. I'll be changing it to init sparsely, at which point it should disappear.

[quote="Eero Tamminen"]
All disk reads during gameplay seem to be D_TextureCacheIn() calls.
[/quote]

In a sense reassuring but any disk reads will likely interfere with profiling.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:
dml wrote:BTW I just added a special fastpath for more distant sprites, and implemented a distance (screenspace pixel area) cull on sprites to save some CPU time. I don't know what impact it has on demo profiling but it should certainly be trying to draw fewer objects now, and with less setup overhead for small ones.
At least based on Hatari profile of Doom II timedemo, your last commit is actually very slightly more expensive than earlier:

Code: Select all

Time spent in profile = 70.71545s.
And in the worst frame profile, time for worst frame render also went very slightly up, but curiously, worst frame thinking part time went slightly down...
It's interesting, but I won't take it too literally until I get a chance to measure individual things - there have been quite a lot of different changes going in and it seems there is still some disk activity during profiling too.

While I'm far from sure the fastpath is switching at the ideal distance it's hard to imagine things getting slower due to drawing fewer sprites. However I do expect that init_stategroups() got 4x more expensive after one of those recent checkins and it might be at least partly to blame.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

http://www.youtube.com/watch?v=W1ZtBCpo0eU

:)

I'd have linked the Doom version but this one made the point better. I must be gettin' old.
CiH
Atari God
Atari God
Posts: 1266
Joined: Wed Feb 11, 2004 4:34 pm
Location: Middle Earth (Npton) UK

Re: Bad Mood : Falcon030 'Doom'

Post by CiH »

If a megacorp made Bad Mood today...

"Hint:- The Atari Falcon you are playing this on is probably older than you are!" :mrgreen:
"Where teh feck is teh Hash key on this Mac?!"
User avatar
DarkLord
Ultimate Atarian
Ultimate Atarian
Posts: 5790
Joined: Mon Aug 16, 2004 12:06 pm
Location: Prestonsburg, KY - USA

Re: Bad Mood : Falcon030 'Doom'

Post by DarkLord »

dml wrote:http://www.youtube.com/watch?v=W1ZtBCpo0eU

:)

I'd have linked the Doom version but this one made the point better. I must be gettin' old.
That's hilarious Doug. Thanks for the link. The point it makes is very pertinent.
Welcome To DarkForce! http://www.darkforce.org "The Fuji Lives.!"
Atari SW/HW based BBS - Telnet:darkforce-bbs.dyndns.org 1040
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:In a sense reassuring but any disk reads will likely interfere with profiling.
In the GEMDOS HD emulation case, there's no cost for those reads on the TOS or HW side as GEMDOS call is done inside Hatari, not in emulated code/HW. I.e. disk reads alone don't cause something to become "worst frame", there needs to be lots of associated processing also on the BM side for them to come up in profile. Because they can be much worse on real HW, I'm checking them separately.

If one wants to profile them more accurately, one could use IDE harddisk image, but that's quite inconvenient and still not same as real HW.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:
dml wrote:In a sense reassuring but any disk reads will likely interfere with profiling.
In the GEMDOS HD emulation case, there's no cost for those reads on the TOS or HW side as GEMDOS call is done inside Hatari, not in emulated code/HW. I.e. disk reads alone don't cause something to become "worst frame", there needs to be lots of associated processing also on the BM side for them to come up in profile. Because they can be much worse on real HW, I'm checking them separately.

If one wants to profile them more accurately, one could use IDE harddisk image, but that's quite inconvenient and still not same as real HW.
Ok that makes sense.

BTW the expensive 'init' function has been removed so 3%+ should have been reclaimed now.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:BTW the expensive 'init' function has been removed so 3%+ should have been reclaimed now.
Unfortunately with that BM now crashes at startup, with both Doom versions:

Code: Select all

BM_E_PrecacheLevel:
Zone: used memory: 0x60820
Zone: free memory: 0x11f7e0
Exception 3 (2101) at 540b4 -> e00fb6!
1. CPU breakpoint condition(s) matched 1 times.
        pc = ( 0x8 )
Finalizing costs for 8 non-returned functions:
- 0x53f70: render_wall (return = 0x525d4)
- 0x52580: R_FlushDeferredSurfaces (return = 0x5257a)
- 0x5252a: R_SubSectorTryFlush (return = 0x5274a)
- 0x52640: descend_bsp (return = 0x51df2)
- 0x51ddc: display_engine (return = 0x4a84c)
- 0x4a81c: _BM_R_DrawScene (return = 0x3df3c)
- 0x3ddca: _R_RenderPlayerView (return = 0x22e5c)
- 0x22cfe: _D_Display (return = 0x231b8)
> history 33
...
R_DrawSurface_flatshade:
$054a96 : 2879 0015 3758                       movea.l   $153758,a4                 0.00% (17, 340, 17)
$054a9c : 3a3c 007f                            move.w    #$7f,d5                    0.00% (17, 136, 17)
$054aa0 : 43f8 a206                            lea       $ffffa206.w,a1             0.00% (17, 136, 17)
$054aa4 : 47f8 a202                            lea       $ffffa202.w,a3             0.00% (17, 136, 17)
$054aa8 : 4e7a 0002                            movec     cacr,d0                    0.00% (17, 136, 17)
$054aac : 2f00                                 move.l    d0,-(sp)                   0.00% (17, 204, 0)
$054aae : 7001                                 moveq     #1,d0                      0.00% (17, 68, 17)
$054ab0 : 4e7b 0002                            movec     d0,cacr                    0.00% (17, 272, 17)
$054ab4 : 7c00                                 moveq     #0,d6                      0.00% (17, 0, 0)
$054ab6 : 3c39 0015 3770                       move.w    $153770,d6                 0.00% (17, 272, 17)
$054abc : 99c6                                 suba.l    d6,a4                      0.00% (17, 68, 0)
$054abe : 0813 0000                            btst      #0,(a3)                    0.00% (921, 10988, 0)
$054ac2 : 67fa                                 beq.s     $54abe                     0.00% (921, 7368, 17)
$054abe : 0813 0000                            btst      #0,(a3)                    0.00% (921, 10988, 0)
$054ac2 : 67fa                                 beq.s     $54abe                     0.00% (921, 7368, 17)
$054abe : 0813 0000                            btst      #0,(a3)                    0.00% (921, 10988, 0)
$054ac2 : 67fa                                 beq.s     $54abe                     0.00% (921, 7368, 17)
$054abe : 0813 0000                            btst      #0,(a3)                    0.00% (921, 10988, 0)
$054ac2 : 67fa                                 beq.s     $54abe                     0.00% (921, 7368, 17)
$054ac4 : 3611                                 move.w    (a1),d3                    0.00% (17, 204, 0)
$054ac6 : 6b00 f5dc                            bmi       $540a4                     0.00% (17, 136, 0)
R_EndSurface:
$0540a4 : 4df9 0020 d460                       lea       $20d460,a6                 0.00% (17, 136, 0)
$0540aa : 0813 0000                            btst      #0,(a3)                    0.00% (17, 204, 0)
$0540ae : 67fa                                 beq.s     $540aa                     0.00% (17, 136, 17)
$0540b0 : 3d51 0114                            move.w    (a1),$114(a6)              0.00% (17, 340, 17)
$0540b4 : 4e75                                 rts                                  0.00% (17, 256, 0)
> r
  D0 00000001   D1 00000009   D2 0000FFFF   D3 0000FFFF 
  D4 00000044   D5 0000007F   D6 00000280   D7 00000000 
  A0 001D9D30   A1 FFFFA206   A2 001D9E0C   A3 FFFFA202 
  A4 0F5464A0   A5 00005A89   A6 0020D460   A7 002BA9F4 
USP  0058E040 ISP  002BA9F4 SFC  00000000 DFC  00000000 
CACR 00000001 VBR  00000000 CAAR 00000000 MSP  00000000
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

It seems related to a cache optimization for flatshading. If you turn off the bldcfg_force_flatshading switch in buildcfg.inc then it should work again (it works here in that mode, but crashes in flatshading mode).

The bug should be fixed with the next checkin.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Have removed a suspect group of changes from the repo and it seems to be working here in both modes now.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Now it works again.

Worst frame thinking part: 0.20406s (=16% improvement), rendering part: 0.17286s (= 5% improvement).

Hatari Doom II timedemo duration decreased slightly from previous commit (0.24% overall improvement):

Code: Select all

Time spent in profile = 70.54821s.
...
Executed instructions:
  23.47%                    30237170                       R_AdvanceSurface_flatshade
   8.23%   8.31%   8.75%    10606473  10705917  11276516   _BM_P_CrossBSPNode
   7.43%                     9570995                       R_VisPlaneFlatShader
   6.41%   6.42%   6.66%     8256622   8266938   8585798   _R_PointInSubsector
   5.98%                     7707668                       R_SpriteColumnShader_Masked2
   5.87%   5.88%  35.36%     7567103   7579244  45568560   _P_RunThinkers
   4.49%   4.50%  13.80%     5784277   5796476  17783709   _P_CheckPosition
   3.00%                     3869688                       R_BSPHyperPlane
   2.54%   2.54%   2.54%     3276796   3278536   3278536   _BM_A_Mux3x2
   2.24%   2.25%   2.45%     2892354   2899922   3158167   _PIT_CheckThing
   1.71%   1.71%   1.81%     2198954   2203607   2335695   R_ViewTestSpriteLines
   1.46%                     1885797                       R_AddLine_loop
   1.39%   1.40%   1.54%     1796108   1800187   1990306   _P_UpdateSpecials
   1.29%   1.32%   1.41%     1664196   1702281   1818564   stack_visplane_area
   1.25%                     1615054                       R_StackTransparentSurface
   1.20%   1.21%   1.21%     1552109   1552969   1552969   _BM_A_Mux2x2
   1.20%                     1550962                       R_DrawTSurface_Masked1
   1.18%   1.18%   1.31%     1515721   1518846   1687684   R_AddSpriteSpans
   1.08%   1.09%   1.15%     1397794   1400642   1487097   R_SetSubSectorLuma
   1.00%                     1289138                       R_DrawSurface_flatshade
   0.93%   0.93%   1.07%     1196467   1199818   1373455   _PIT_CheckLine
   0.91%                     1167100                       build_ssector
   0.89%   0.89%   0.89%     1143212   1144032   1144032   _BM_A_Mux1x2
   0.89%   0.89%   2.23%     1141090   1143292   2867229   get_ssector
   0.81%   0.81%   9.64%     1041176   1048957  12427840   _BM_P_CheckSight
   0.73%   0.73%   9.02%      943806    945747  11625088   _P_LookForPlayers
   0.70%   0.70%   0.73%      901444    903074    940960   add_wall_segment
   0.61%   0.61%  14.60%      782750    784987  18814282   _P_Move
   0.55%   0.55%  15.94%      708808    713079  20539155   _P_TryMove
   0.50%   0.50%  26.76%      646818    648509  34484834   _P_SetMobjState
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Regarding Doom II PWADs...

mtfactor

I think Doom2 mtfactor.wad PWAD is also better with flat shading and now I could pick up all the pickups. But there's very minor drawing issue with it, the water in the beginning of the level has a glitch which flickers when one moves around it:
grab0003.png
polygon

When trying to play third round of polygon.wad, BM gave allocation error on level reload, so maybe there's some leak in regards to PWADs:

Code: Select all

Error: Z_Malloc: failed on allocation of 188184 bytes
After that it happened immediately on next BM start with that PWAD, if I let BM run the timedemo at startup. So it might also be just that there are too many objects...?

Pretty full polygon.wad profile looks following:

Code: Select all

Time spent in profile = 535.71525s.
...
Executed instructions:
  30.13%                   304182591                       R_AdvanceSurface_flatshade
  12.94%  12.96%  30.88%   130626931 130818507 311747445   _P_RunThinkers
  10.60%                   106973251                       R_VisPlaneFlatShader
   5.61%                    56624325                       R_SpriteColumnShader_Masked2
   4.87%   4.91%   5.09%    49136991  49565929  51348878   _BM_P_CrossBSPNode
   3.09%   3.10%   8.86%    31215040  31271916  89503964   _P_LookForPlayers
   2.56%                    25871647                       R_BSPHyperPlane
   2.01%   2.01%   2.06%    20295815  20310902  20774471   _R_PointInSubsector
   1.84%   1.84%   1.84%    18569278  18580653  18580653   _BM_A_Mux1x2
   1.62%   1.62%   6.80%    16337468  16403323  68654426   _BM_P_CheckSight
   1.41%   1.42%   1.48%    14272481  14309829  14902035   _P_UpdateSpecials
   1.33%   1.36%   1.41%    13467053  13686390  14206530   stack_visplane_area
   1.31%                    13210554                       R_AddLine_loop
   1.28%   1.28%   3.21%    12915461  12947492  32419218   _P_CheckPosition
   1.05%                    10648039                       R_DrawSurface_flatshade
   1.02%                    10285150                       R_StackTransparentSurface
   0.99%   0.99%  15.52%     9954495   9979682 156681507   _P_SetMobjState
   0.94%   0.94%   0.98%     9506542   9528448   9877880   R_ViewTestSpriteLines
   0.92%   0.92%   0.94%     9297293   9309646   9516556   D_FlatMipGen_8_16
   0.66%   0.66%   0.84%     6674584   6684270   8520398   get_flat_floor
   0.66%                     6622005                       build_ssector
   0.63%   0.63%   0.68%     6393684   6408171   6837689   R_AddSpriteSpans
   0.55%   0.55%   1.79%     5581860   5593735  18054244   get_ssector
   0.53%   0.53%   0.58%     5324029   5333660   5867747   _R_ClearPlanes
...
Instruction cache misses:
  15.32%  15.41%  62.80%    19039955  19151802  78030973   _P_RunThinkers
   9.13%   9.16%  13.12%    11347744  11376999  16303264   _BM_P_CheckSight
   8.93%   8.95%  20.93%    11094556  11123719  26005355   _P_LookForPlayers
   5.27%   5.28%  41.54%     6549856   6562391  51621294   _P_SetMobjState
   3.67%   3.68%   7.38%     4564389   4578661   9171269   _P_CheckPosition
   3.66%   3.67%  27.19%     4546738   4554652  33787288   _A_Look

DSP side, used cycles:
  46.68%                  8023290492                       command_base
  18.70%  20.96%  20.96%  321419641436027349203602734920   R_DoColumnPerspCorrect
  10.02%                  1722076216                       R_VPRenderFlat
   3.63%                   624473260                       ALGO_P_CrossBSPNode
   2.99%   2.99%   2.99%   513827442 513827442 513827442   extract_subvisplane
   2.73%                   470084810                       R_DoColumnTextureUV
   1.90%                   326865036                       R_ViewTestAddLine
   1.77%   2.15%   4.41%   304555474 368780408 757771202   AddLowerWall
   1.65%   1.90%  19.28%   284374444 3266672303314239434   AddMidWall
   1.43%                   245043468                       P_CrossSubsector_body
   1.38%                   237275546                       project_node
   0.91%                   156767514                       R_CheckBBoxPair
   0.75%   0.89%   3.09%   128825108 153581568 530941548   AddUpperWall
   0.51%   0.51%   0.51%    87862150  87870890  87870890   InterceptVectorsUF
I don't think there's anything really interesting there. However...

Worst render frame does this:

Code: Select all

Time spent in profile = 1.67425s.

Visits/calls:
  97.38%  97.61%       52412     52536   correct_element
   0.62%                 334             init_font
   0.59%                 319             R_AdvanceSurface_flatshade

Executed instructions:
  34.67%  34.77%  35.98%      995904    998743   1033627   correct_element
  21.16%  21.19%  21.59%      607832    608730    620228   D_FlatMipGen_24_16
  10.86%  10.88%  44.81%      312006    312447   1287281   create_local_palette_64levels
   8.74%   8.76%   9.00%      251189    251778    258541   D_TextureRemapPixelSubBlock_16_8
   8.45%   8.47%  12.22%      242673    243195    350951   D_FlatMipRemap_16_8
   6.84%   6.86%   7.02%      196634    197010    201770   D_TextureRemapInit
   3.19%                       91581                       R_AdvanceSurface_flatshade
   2.28%   2.29%   2.29%       65624     65664     65664   _BM_A_Mux1x2
   1.08%   1.09%   1.13%       31158     31220     32344   D_PatchTargetGetApproxRGB
   0.70%   0.70%   0.70%       20042     20053     20053   D_FlatPackMipPages
   0.69%   0.69%   2.81%       19811     19873     80631   D_FlatCLUTGen_32x32
...
Instruction cache misses:
  34.38%                       17130                       correct_element
  33.06%  33.11%  66.47%       16473     16498     33121   D_FlatCLUTGen_32x32
   4.74%   5.09%  15.29%        2362      2534      7617   create_local_palette_64levels
   4.35%   4.35%   7.68%        2167      2167      3826   _subframe_block
   4.04%                        2013                       init_font
   3.29%   3.33%   3.33%        1637      1659      1659   _BM_A_Mux1x2
   2.53%   2.53%   2.53%        1261      1261      1261   _frame_event
   2.29%   3.16%   5.20%        1142      1574      2592   D_FlatMipGen_24_16
   1.55%   1.55%  11.76%         773       773      5860   _audio_mux_frame
Seems I hit something I hadn't hit earlier?

Is there something in instruction cache misses that could be "easily" improved? Cache miss callgraph is attached, and profile data looks like this:

Code: Select all

correct_element:
$055f1a :             mulu.w    d5,d7                      1.82% (52416, 1481468, 5389)
$055f1c :             neg.l     d7                         1.82% (52416, 209720, 15)
$055f1e :             add.l     #$ffff,d7                  1.82% (52416, 433576, 3170)
$055f24 :             move.l    d7,d6                      1.82% (52416, 209752, 10)
$055f26 :             mulu.l    d7,d7                      1.82% (52416, 2517336, 0)
$055f2a :             clr.w     d7                         1.82% (52416, 209720, 3166)
$055f2c :             swap      d7                         1.82% (52416, 212536, 2)
$055f2e :             mulu.w    $55f5c(pc),d7              1.82% (52416, 1888352, 0)
$055f32 :             move.w    #$100,d4                   1.82% (52416, 209664, 0)
$055f36 :             sub.w     $55f5c(pc),d4              1.82% (52416, 421232, 0)
$055f3a :             mulu.w    d4,d6                      1.82% (52416, 1473040, 1081)
$055f3c :             add.l     d6,d7                      1.82% (52416, 205452, 7)
$055f3e :             lsr.l     #8,d7                      1.82% (52416, 214524, 1061)
$055f40 :             neg.l     d7                         1.82% (52416, 209888, 2)
$055f42 :             add.l     #$ffff,d7                  1.82% (52416, 424016, 1061)
$055f48 :             bpl.s     $55f4c                     1.82% (52416, 420008, 0)
[...]
$055f4c :             cmp.l     #$ffff,d7                  1.82% (52416, 419884, 0)
$055f52 :             bmi.s     $55f5a                     1.82% (52416, 424068, 1063)
[...]
$055f5a :             rts                                  1.82% (52416, 638232, 1103)
...
D_FlatCLUTGen_32x32:
$04f8fc :             lea       $800(a2),a2                0.00% (1, 8, 1)
$04f900 :             moveq     #$1f,d7                    0.00% (1, 4, 0)
$04f902 :             move.w    d7,-(sp)                   0.00% (32, 256, 16)
$04f904 :             moveq     #0,d5                      0.00% (32, 24, 0)
$04f906 :             move.w    d7,d5                      0.00% (32, 128, 6)
$04f908 :             lsl.w     #8,d5                      0.00% (32, 128, 0)
$04f90a :             divu.w    #$1f,d5                    0.00% (32, 1536, 0)
$04f90e :             move.w    #$1f,d6                    0.00% (32, 128, 0)
$04f912 :             movea.l   a1,a4                      0.00% (32, 128, 32)
$04f914 :             lea       $ffc0(a2),a2               0.00% (32, 256, 32)
$04f918 :             movea.l   a2,a3                      0.00% (32, 128, 0)
$04f91a :             move.w    d6,-(sp)                   0.04% (1024, 8192, 3008)
$04f91c :             moveq     #0,d7                      0.04% (1024, 4096, 0)
$04f91e :             move.b    (a4)+,d7                   0.04% (1024, 12288, 1024)
$04f920 :             bsr       $55f1a                     0.04% (1024, 16384, 1024)
$04f924 :             move.w    d7,d1                      0.04% (1024, 4096, 2048)
$04f926 :             moveq     #0,d7                      0.04% (1024, 4208, 1024)
$04f928 :             move.b    (a4)+,d7                   0.04% (1024, 8192, 0)
$04f92a :             bsr       $55f1a                     0.04% (1024, 8192, 0)
$04f92e :             move.w    d7,d2                      0.04% (1024, 4096, 3072)
$04f930 :             moveq     #0,d7                      0.04% (1024, 4152, 0)
$04f932 :             move.b    (a4)+,d7                   0.04% (1024, 12288, 1026)
$04f934 :             bsr       $55f1a                     0.04% (1024, 16384, 1024)
$04f938 :             move.w    d7,d3                      0.04% (1024, 4096, 2048)
$04f93a :             bfins     d1,d4{16:16}               0.04% (1024, 12352, 0)
$04f93e :             bfins     d2,d4{21:16}               0.04% (1024, 12344, 0)
$04f942 :             bfins     d3,d4{27:16}               0.04% (1024, 12288, 0)
$04f946 :             move.w    d4,(a3)+                   0.04% (1024, 8192, 1024)
$04f948 :             move.w    (sp)+,d6                   0.04% (1024, 8192, 0)
$04f94a :             dbra      d6,$4f91a                  0.04% (1024, 8320, 0)
$04f94e :             move.w    (sp)+,d7                   0.00% (32, 384, 32)
$04f950 :             dbra      d7,$4f902                  0.00% (32, 392, 32)
$04f954 :             rts
Worst thinking frame does this:

Code: Select all

Time spent in profile = 0.26271s.
...
Visits/calls:
  40.36%1638.19%        2364     95949   _P_RecursiveSound
   6.54%   7.00%         383       410   _PIT_CheckLine
   6.20%  13.45%         363       788   _P_CheckSight
...
Executed instructions:
  25.59%  25.65%1046.64%      111765    112007   4570743   _P_RecursiveSound
  18.80%  18.84%  68.44%       82082     82280    298864   _P_RunThinkers
  10.96%  10.96%  10.96%       47860     47857     47857 * _BM_P_CrossBSPNode
   8.93%   8.93%   8.93%       38979     39000     39000   _R_PointInSubsector
   5.98%   5.99%  14.84%       26102     26156     64799   _P_CheckPosition
   4.05%   4.06%  13.17%       17688     17726     57506   _P_LookForPlayers
   3.07%   3.07%   3.07%       13400     13407     13407   _PIT_AddLineIntercepts_S
   2.71%   2.81%  13.77%       11832     12271     60128   _BM_P_CheckSight
   2.58%   2.59%   2.59%       11275     11295     11295   _BM_A_Mux1x2
   2.26%   2.27%   5.78%        9873      9908     25224   _P_PathTraverse
   2.04%   2.05%   2.05%        8910      8937      8937   _P_UpdateSpecials
   1.41%   1.42%   1.78%        6179      6210      7794   _PIT_CheckLine
   1.41%   1.41%  32.84%        6136      6136    143411   _P_SetMobjState
   0.97%   0.98%   0.98%        4241      4262      4262   _PIT_CheckThing
   0.82%   0.82%  17.15%        3575      3582     74909   _P_TryMove
   0.78%   0.79%  17.59%        3421      3448     76829   _A_Look
   0.66%   0.66%  14.43%        2904      2904     63032   _P_CheckSight
...
Sound recursion taking 1/4 of CPU is a bit surprising. It's called from A_ReFire(), callgraph is attached, and the code is below:

Code: Select all

_P_RecursiveSound:
$02e4f0 :             movem.l   d2-d3/a2-a3,-(sp)          0.54% (2364, 94584, 6)
$02e4f4 :             movea.l   $14(sp),a3                 0.54% (2364, 28448, 6)
$02e4f8 :             move.w    $1a(sp),d3                 0.54% (2364, 18940, 7)
$02e4fc :             move.w    $8eee8,d0                  0.54% (2364, 18912, 0)
$02e502 :             cmp.w     $40(a3),d0                 0.54% (2364, 18968, 0)
$02e506 :             bne.s     $2e514                     0.54% (2364, 9556, 7)
$02e508 :             movea.w   $12(a3),a1                 0.41% (1793, 14368, 6)
$02e50c :             movea.w   d3,a0                      0.41% (1793, 8, 0)
$02e50e :             addq.l    #1,a0                      0.41% (1793, 7172, 4)
$02e510 :             cmpa.l    a1,a0                      0.41% (1793, 7228, 0)
$02e512 :             bge.s     $2e584                     0.41% (1793, 14364, 7)
$02e514 :             move.w    d0,$40(a3)                 0.13% (571, 4596, 7)
$02e518 :             move.w    d3,d0                      0.13% (571, 2272, 0)
$02e51a :             addq.w    #1,d0                      0.13% (571, 2284, 10)
$02e51c :             move.w    d0,$12(a3)                 0.13% (571, 4608, 10)
$02e520 :             move.l    $2bfb68,$14(a3)            0.13% (571, 11500, 10)
$02e528 :             clr.w     d2                         0.13% (571, 2284, 0)
$02e52a :             cmp.w     $4c(a3),d2                 0.13% (571, 4624, 0)
$02e52e :             bge.s     $2e584                     0.13% (571, 2288, 1)
$02e530 :             movea.w   d2,a1                      0.95% (4130, 16556, 12)
$02e532 :             movea.l   $50(a3),a0                 0.95% (4130, 49560, 0)
$02e536 :             movea.l   (a0,a1.l*4),a2             0.95% (4130, 49672, 0)
$02e53a :             movea.l   $32(a2),a1                 0.95% (4130, 49676, 0)
$02e53e :             tst.l     a1                         0.95% (4130, 16520, 5)
$02e540 :             beq.s     $2e57c                     0.95% (4130, 22908, 0)
$02e542 :             movea.l   $2e(a2),a0                 0.58% (2530, 30416, 0)
$02e546 :             move.l    (a0),d0                    0.58% (2530, 30488, 4)
$02e548 :             cmp.l     4(a1),d0                   0.58% (2530, 30376, 4)
$02e54c :             bge.s     $2e57c                     0.58% (2530, 10120, 0)
$02e54e :             move.l    4(a0),d0                   0.55% (2402, 28824, 0)
$02e552 :             cmp.l     (a1),d0                    0.55% (2402, 28840, 4)
$02e554 :             ble.s     $2e57c                     0.55% (2402, 9756, 0)
$02e556 :             move.l    a0,d0                      0.54% (2365, 9460, 4)
$02e558 :             cmpa.l    d0,a3                      0.54% (2365, 9516, 0)
$02e55a :             bne.s     $2e55e                     0.54% (2365, 14204, 4)
$02e55c :             move.l    a1,d0                      0.27% (1183, 4788, 0)
$02e55e :             btst      #6,$11(a2)                 0.54% (2365, 28396, 4)
$02e564 :             beq.s     $2e570                     0.54% (2365, 18908, 0)
$02e566 :             tst.w     d3                         0.00% (3, 12, 2)
$02e568 :             bne.s     $2e57c                     0.00% (3, 24, 0)
[...]
$02e570 :             movea.w   d3,a0                      0.54% (2362, 9448, 7)
$02e572 :             move.l    a0,-(sp)                   0.54% (2362, 28456, 1)
$02e574 :             move.l    d0,-(sp)                   0.54% (2362, 28400, 0)
$02e576 :             bsr       $2e4f0                     0.54% (2362, 18896, 0)
$02e57a :             addq.l    #8,sp                      0.54% (2362, 9448, 16)
$02e57c :             addq.w    #1,d2                      0.95% (4130, 16576, 5)
$02e57e :             cmp.w     $4c(a3),d2                 0.95% (4130, 33152, 0)
$02e582 :             blt.s     $2e530                     0.95% (4130, 16584, 10)
$02e584 :             movem.l   (sp)+,d2-d3/a2-a3          0.54% (2364, 113544, 18)
$02e588 :             rts                                  0.54% (2364, 28480, 0)
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Hi, I'm short on time for the project this week but will reply quickly:
Eero Tamminen wrote: I think Doom2 mtfactor.wad PWAD is also better with flat shading and now I could pick up all the pickups. But there's very minor drawing issue with it, the water in the beginning of the level has a glitch which flickers when one moves around it:
There are some glitches of this kind in BSP classification. There are also some precision shortcuts taken in two related places so I can try tightening those up to see if it helps. However I'm aware of some of these glitches in levels anyway because the BSP cutting tool slices lines in integer space and causes some degenerate vertices and stitching. It can be seen on e1m1 on the left side as you walk into the room for example.

The ones that are caused by deliberate precision dropping can be tidied up though.
Eero Tamminen wrote: When trying to play third round of polygon.wad, BM gave allocation error on level reload, so maybe there's some leak in regards to PWADs:

Code: Select all

Error: Z_Malloc: failed on allocation of 188184 bytes
After that it happened immediately on next BM start with that PWAD, if I let BM run the timedemo at startup. So it might also be just that there are too many objects...?
I think this is a flat-memory-model fragmentation issue with Doom and the fact I have reduced it's arena size considerably. It seems to start getting problems after a few reloads. I'm considering switching the Z_Alloc stuff over to BM's 'virtual memory' model instead for some things although it will be painful to do. The two models are not very compatible.
Eero Tamminen wrote: Seems I hit something I hadn't hit earlier?

Is there something in instruction cache misses that could be "easily" improved? Cache miss callgraph is attached, and profile data looks like this:

Code: Select all

correct_element:
$055f1a :             mulu.w    d5,d7                      1.82% (52416, 1481468, 5389)
$055f1c :             neg.l     d7                         1.82% (52416, 209720, 15)
[/quote]

'correct_element' is gamma correction for palette table building. It's directly related to texture loading from the original sources. Therefore, shouldn't happen during play. I think you just hit some unseen sprites or textures. 

It shouldn't do this when loading from the BMC cache for example - those are already precalculated.

Providing the sprites etc. are pre-converted you'll never see this so there's not much point in optimizing it.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Eero Tamminen wrote:Sound recursion taking 1/4 of CPU is a bit surprising. It's called from A_ReFire(), callgraph is attached, and the code is below:
From sources I noticed that you had separate version of this and its (direct) P_NoiseAlert() caller for different BM game logic optimization levels. Calling of this is timer-limited on higher TIMERBASE_CONTROL levels, so I hope it's already OK on them. I guess I need to do some manual profiling again with them.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote:
Eero Tamminen wrote:Sound recursion taking 1/4 of CPU is a bit surprising. It's called from A_ReFire(), callgraph is attached, and the code is below:
From sources I noticed that you had separate version of this and its (direct) P_NoiseAlert() caller for different BM game logic optimization levels. Calling of this is timer-limited on higher TIMERBASE_CONTROL levels, so I hope it's already OK on them. I guess I need to do some manual profiling again with them.
Hi,

Yes P_NoiseAlert() and it's RecursiveSound() component are interesting - they seemed to be innocuous most of the time and occasionally cause a huge overhead. Sometimes regularly throughout a level.

I replaced them with a modified version of the recursive part and placed a duty cycle control at the top so alerts can't be initiated more than once per second per sector. This does get rid of the problem but it might just be causing rare spikes which don't show in normal profiling instead of more regular, obvious problems. It may still show in worst-frame profiling sometimes.

So it should be a lot better with TIMEBASE_CONTROL at higher levels but it's probably still not ideal. I spent much less time looking at this than the LOS code.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

BTW I did something similar with P_CheckSight code after all the asm optimizations - the top level implements a cache using the pair of 'mobj' AI objects as a key, and only allows new samples to be taken for a given mobj pair after approx 1 second has elapsed. So an increase in P_CheckSight frequency doesn't result in a proportional rise in cost.

This pretty much removes it from the profiling but it does make the AIs a bit sleepy when you surprise them. It should probably be adjusted a bit to 0.5 seconds or so.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

I've now tried polygon.wad with TIMERBASE_CONTROL=3 and it's pretty playable with that.

But there are a lot of small pauses during the game when BM loads more data from disk. Shouldn't e.g. these be loaded at startup, not when player uses pistol or shotgun for the first time:

Code: Select all

SPR\PISFA0.BMT
SPR\PISGB0.BMT
SPR\PISGC0.BMT
SPR\SHOTA0.BMT
SPR\SHTGA0.BMT
SPR\SHTFB0.BMT
SPR\SHTGB0.BMT
SPR\SHTGC0.BMT
SPR\SHTGD0.BMT
flats\TLITE6_5.bmp
TEX\SW2COMM.BMT
TEX\BLODRIP2.BMT
TEX\BLODRIP3.BMT
TEX\BLODRIP4.BMT
...
Especially the TLITE* flats loadings are noticeable.

These were missing (and apparently not generated):

Code: Select all

flats\CONS1_7.bmp
flats\FLAT19.bmp
flats\NUKAGE1.bmp
flats\FLAT3.bmp
flats\FLAT23.bmp
flats\FLAT20.bmp
Profile for TIMEBASE_CONTROL=3 of polygon.wad:

Code: Select all

Executed instructions:
  35.81%                   521574445                       R_AdvanceSurface_flatshade
  14.72%                   214426668                       R_VisPlaneFlatShader
  11.17%  11.18%  17.50%   162691559 162889433 254942402   _P_RunThinkers
   6.15%                    89574955                       R_SpriteColumnShader_Masked2
   3.18%                    46248158                       R_BSPHyperPlane
   2.04%   2.06%   2.13%    29718082  29984386  30966973   _BM_P_CrossBSPNode
   1.80%   1.80%   1.80%    26237557  26252202  26252202   _BM_A_Mux1x2
   1.71%                    24913536                       R_AddLine_loop
   1.60%   1.63%   1.69%    23295379  23694875  24544259   stack_visplane_area
   1.27%   1.27%   1.55%    18492668  18506593  22600460   get_flat_floor
   1.23%                    17963522                       R_DrawSurface_flatshade
   1.19%                    17384133                       R_StackTransparentSurface
   1.07%   1.07%   1.12%    15599720  15640620  16290031   _P_UpdateSpecials
   1.06%   1.06%   1.09%    15403350  15437040  15915107   R_ViewTestSpriteLines
   0.83%                    12116210                       build_ssector
   0.74%   0.74%   0.76%    10834773  10844711  11024783   _R_PointInSubsector
   0.71%   0.71%   0.76%    10346532  10368213  11107043   R_AddSpriteSpans
   0.65%   0.65%   0.66%     9399312   9412926   9672635   R_SetSubSectorLuma
   0.64%   0.64%   0.65%     9297293   9309664   9460413   D_FlatMipGen_8_16
   0.58%   0.59%   2.76%     8504687   8521787  40253749   _P_LookForPlayers
   0.57%   0.57%   0.62%     8319303   8331502   9100285   _R_ClearPlanes
Although I've played it now several times, I still get this as worst render frame:

Code: Select all

Time spent in profile = 1.02134s.
...
Executed instructions:
  65.60%  65.68%  66.81%     1548752   1550844   1577529   D_FlatMipGen_8_16
   8.41%   8.42%  17.64%      198477    198902    416493   D_FlatMipRemap_16_8
   8.33%   8.34%   8.97%      196634    196990    211721   D_TextureRemapInit
   5.25%   5.27%   5.46%      124032    124331    128847   correct_element
   3.95%                       93283                       R_AdvanceSurface_flatshade
   1.70%   1.70%   1.75%       40084     40149     41273   D_FlatPackMipPages
   1.68%   1.68%   6.83%       39622     39722    161255   D_FlatCLUTGen_32x32
   1.52%   1.52%   1.52%       35900     35940     35940   _BM_A_Mux1x2
Maybe because those flats weren't generated (although BM checks for them every time early on the level)?

Even with timerbase 3, recursive sound handling is highest CPU user on worst thinking frame:

Code: Select all

Time spent in profile = 0.16218s.
...
Visits/calls:
  58.60%2386.93%        2367     96408   _P_RecursiveSound
   8.05%   8.15%         325       329   _PIT_AddLineIntercepts_L
...
Executed instructions:
  39.97%  40.05%1718.04%      110970    111201   4770022   _P_RecursiveSound
  14.11%  14.13%  26.43%       39167     39240     73387   _P_RunThinkers
  10.35%  10.36%  25.95%       28743     28775     72062   _P_PathTraverse
   7.22%   7.24%   8.60%       20045     20101     23870   _PIT_AddLineIntercepts_L
   3.76%   3.77%   3.77%       10449     10469     10469   _BM_A_Mux3x2
   3.30%   3.30%   3.30%        9172      9172      9172   _R_PointInSubsector
   2.55%   2.55%   2.55%        7071      7071      7071   _BM_A_Mux2x2
   1.47%   1.47%   3.59%        4071      4082      9963   _P_CheckPosition
   1.30%   1.30%   1.30%        3604      3611      3611   _P_UpdateSpecials
   1.22%   1.23%   1.23%        3386      3406      3406   _BM_P_CrossBSPNode
   1.21%   1.22%   4.42%        3355      3379     12281   _PTR_ShootTraverse
   1.06%   1.07%   1.07%        2948      2968      2968   _P_PointOnDivlineSide
   0.98%   0.98%   2.05%        2714      2721      5689   _PIT_AddThingIntercepts
   0.92%                        2561                       R_StackTransparentSurface
   0.68%   0.68%   1.98%        1876      1883      5508   _P_LookForPlayers
   0.59%   0.59%   0.65%        1633      1633      1793   _PIT_CheckLine
   0.57%   0.57%   0.57%        1574      1574      1574   _P_LineOpening
   0.50%   0.50%   2.18%        1394      1394      6045   _P_CheckSight
...
Instruction cache misses:
  19.74%  19.80%  20.19%        8121      8147      8307   _PIT_AddLineIntercepts_L
  18.80%  18.83%  56.03%        7734      7749     23055   _P_PathTraverse
   9.48%   9.56%  32.99%        3900      3934     13573   _P_RunThinkers
   3.95%   3.96%   7.11%        1626      1629      2924   _PIT_AddThingIntercepts
   3.62%   3.65%   8.30%        1490      1501      3415   _PTR_ShootTraverse
   3.27%   3.28%   7.43%        1344      1350      3059   _P_CheckPosition
   3.12%   3.15%   3.15%        1284      1295      1295   _P_PointOnDivlineSide
   2.70%   2.70%   2.94%        1112      1112      1208   _PIT_CheckLine
   2.29%   2.29%   2.29%         944       944       944   _P_LineOpening
   2.05%   2.05%   2.05%         842       842       842   _R_PointInSubsector
   1.92%   1.92%  17.60%         792       792      7243   _P_SetMobjState
Note that while the RecursiveSound percentage of frame was larger, the frame itself was 0.16s instead of 0.26s.

Single P_NoiseAlert() call from A_WeaponReady() causes 36 calls to P_RecursiveSound() and it then calls itself 2331 times. I guess rate limit just in P_NoiseAlert() isn't enough?
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:This pretty much removes it from the profiling but it does make the AIs a bit sleepy when you surprise them. It should probably be adjusted a bit to 0.5 seconds or so.
For comparison, here's something on human reaction time:
http://en.wikipedia.org/wiki/Mental_chronometry#Types

Beasts probably should have better reaction time than humans, but it depends on whether the stimulus for reaction is expected or not and how nasty we want the monsters to be. :-)

Return to “680x0”