Bad Mood : Falcon030 'Doom'

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

I haven't uploaded a test since a lot of stuff got reworked, so here's an interim build. There are a few glitches I will have to fix (including a missing wall in the 2nd room of e1m1) but it's almost back together and the FPS reading is up since last time.
BM407Fa.zip
There is plenty of code tidying and several ugly DSP code-level optimizations needing done but it should be in good shape soon to have some fun with the Doom code. I'm getting bored with staring at DSP code anyway :) a break from that will be good.
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

When comparing profiles for identical, automated BadMood runs, there are some differences.

I think the reason for these is differences in how many cycles are spent since machine was booted, before the program is started (because I skip the bootup memory test by manually pressing enter i.e. its timing can differ), and the initial IKBD clock value (it gets its initial boot-up value from host). These can change at what points the system interrupts happen during program run.

If you at some point do add VBL synching, it might be good to check whether that significantly decreases the differences between runs.

For DSP these variance 're not something to worry about as 0.05% is largest difference I've seen for that:

Code: Select all

-p:03fc  0aa981 0003fc  (06 cyc)  jclr #1,x:$ffe9,p:$03fc                          28.32% (384106, 2304636, 0)
+p:03fc  0aa981 0003fc  (06 cyc)  jclr #1,x:$ffe9,p:$03fc                          28.27% (383462, 2300772, 0)
...
-p:072a  0aa980 00072a  (06 cyc)  jclr #0,x:$ffe9,p:$072a                           0.01% (178, 1068, 0)
+p:072a  0aa980 00072a  (06 cyc)  jclr #0,x:$ffe9,p:$072a                           0.03% (443, 2658, 0)
I'm just wondering is this interrupt timing difference also the reason for DSP cycle differences when there are no differences in instruction counts:

Code: Select all

 $01fac8 :             move.l    #$80000000,d7              0.00% (56, 448, 0)
 $01face :             move.l    #$40000000,d5              0.00% (56, 672, 56)
-$01fad4 :             move.l    d7,d3                      0.02% (752, 3008, 2)
+$01fad4 :             move.l    d7,d3                      0.02% (752, 3008, 0)
 $01fad6 :             mulu.l    d3,d3,d4                   0.02% (752, 36096, 0)
-$01fada :             cmp.l     d6,d4                      0.02% (752, 3008, 57)
+$01fada :             cmp.l     d6,d4                      0.02% (752, 3008, 56)
 $01fadc :             bgt.s     $1faea                     0.02% (752, 5344, 0)
?

CPU side profile differences are naturally larger, but probably still smaller than Hatari's Falcon emulation CPU cycle accuracy issues, so I wouldn't worry about that either. :-)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Thanks for the diagnostics.

I have also noticed small differences between runs and had put it down to interrupts on the CPU and perhaps differences in initial state between the DSP and CPU (a kind of 'wakeup mode' where the DSP resets on odd/even CPU cycles.

TBH cascading effects from one frame to another with cache and interrupts on the CPU is probably enough complexity to produce bizarre timing drifts which are difficult to analyse, esp. without a vsync to start each pass.

I can add a vsync option to the commandline or something to force it for profiling etc.

It's reassuring though to see there are no nasty spikes caused by bugs :)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Eero Tamminen wrote:CPU side profile differences are naturally larger, but probably still smaller than Hatari's Falcon emulation CPU cycle accuracy issues, so I wouldn't worry about that either. :-)
While on the CPU side the effect is spread wider (due to interrupts happening at "random" places), on function level the effect actually isn't really larger, except for cache misses, there it apparently can sometimes be >0.1%:

Code: Select all

 Executed instructions:
- 60.07%             237540  render_wall_1x1
+ 60.08%             237540  render_wall_1x1
  13.60%              53790  clearlongs
- 10.13%              40038  render_flats_1x1
-  3.27%    3.28%     12920  stream_texture
-  2.50%               9870  dividing_node
-  1.80%    1.81%      7112  stack_visplane_area
-  1.27%               5018  segment_loop
-  1.15%    1.15%      4546  add_partition_segment
+ 10.13%              40034  render_flats_1x1
+  3.27%    3.29%     12920  stream_texture
+  2.48%               9814  dividing_node
+  1.79%    1.81%      7096  stack_visplane_area
+  1.27%               5022  segment_loop
...
 Instruction cache misses:
- 15.17%               1928  render_wall_1x1
- 12.52%               1591  segment_loop
-  5.71%    5.71%       726  process_lighting
-  4.52%                574  memory_handle
-  3.87%   32.71%       492  add_wall_segment
-  3.68%                468  dividing_node
-  3.55%                451  seg_prelight_done
+ 14.90%               1890  render_wall_1x1
+ 12.54%               1591  segment_loop
+  5.72%    5.72%       726  process_lighting
+  4.53%                574  memory_handle
+  3.89%   32.33%       494  add_wall_segment
+  3.82%                485  dividing_node
+  3.56%                451  seg_prelight_done
(This was for the 1st rendered frame, from 1st "r_begin" to next "r_begin".)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: While on the CPU side the effect is spread wider (due to interrupts happening at "random" places), on function level the effect actually isn't really larger, except for cache misses, there it apparently can sometimes be >0.1%:

Code: Select all

 Executed instructions:
- 60.07%             237540  render_wall_1x1
+ 60.08%             237540  render_wall_1x1
  13.60%              53790  clearlongs
- 10.13%              40038  render_flats_1x1
-  3.27%    3.28%     12920  stream_texture
-  2.50%               9870  dividing_node
-  1.80%    1.81%      7112  stack_visplane_area
-  1.27%               5018  segment_loop
-  1.15%    1.15%      4546  add_partition_segment
+ 10.13%              40034  render_flats_1x1
+  3.27%    3.29%     12920  stream_texture
+  2.48%               9814  dividing_node
+  1.79%    1.81%      7096  stack_visplane_area
+  1.27%               5022  segment_loop
...
 Instruction cache misses:
- 15.17%               1928  render_wall_1x1
- 12.52%               1591  segment_loop
-  5.71%    5.71%       726  process_lighting
-  4.52%                574  memory_handle
-  3.87%   32.71%       492  add_wall_segment
-  3.68%                468  dividing_node
-  3.55%                451  seg_prelight_done
+ 14.90%               1890  render_wall_1x1
+ 12.54%               1591  segment_loop
+  5.72%    5.72%       726  process_lighting
+  4.53%                574  memory_handle
+  3.89%   32.33%       494  add_wall_segment
+  3.82%                485  dividing_node
+  3.56%                451  seg_prelight_done
(This was for the 1st rendered frame, from 1st "r_begin" to next "r_begin".)
Aha - watch out for the 'clear_longs' which is clearing the framebuffer - it is configured to do this only 3 times (first time each of the 3 backbuffers is visited) after which it will stop. I forgot to mention this.

So it's probably better to measure from the 4th or 5th frame in, earliest.

From the relative weight of the wall rendering, I assume this is an e4mX (e4m2?) level? Those levels have huge wall counts! :) Good for profiling.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Aha - watch out for the 'clear_longs' which is clearing the framebuffer - it is configured to do this only 3 times (first time each of the 3 backbuffers is visited) after which it will stop. I forgot to mention this. So it's probably better to measure from the 4th or 5th frame in, earliest.
Ok, I'm now profiling following:
1. startup until "r_begin" -> badmood-startup-*
2. 1st frame until next "r_begin" -> badmood-1st-frame-*
3. skip 7 next "r_begin" instances
4. frames until 8th "r_begin" -> badmood-8-frames-*
5. frame until "r_end" -> badmood-1-frame

2) and 4) can be used to compare relative costs between first and later frames, 4) and 5) can be used to compare relative costs between just rendering the frame (5th profile) and full frame (4th profile).

From what I can see, the relative costs between items in 4) & 5), both for CPU & DSP side are about within 1%, so I think looking at the full frame cost is fine.
dml wrote:From the relative weight of the wall rendering, I assume this is an e4mX (e4m2?) level? Those levels have huge wall counts! :) Good for profiling.
Yes, the difference between Doom1 and Doom2 WAD costs is pretty radical.

Startup

Doom1, 17.0s:

Code: Select all

Executed instructions:
 46.01%   48.27%  17762954  flat_generate_mips
 17.76%   17.80%   6856544  flat_remap_mips
 13.03%            5029082  render_patch_direct
  5.75%    5.77%   2217984  correct_element
  3.56%    3.59%   1376259  create_quick_alpha
  2.18%    2.18%    840994  mip_plot_16bit
...
Instruction cache misses:
 56.33%   73.52%   1232677  flat_generate_mips
 16.47%   16.51%    360479  mip_plot_16bit
  8.48%    8.74%    185577  flat_remap_mips
  7.79%             170520  ROM_TOS
Doom2, 24.3s:

Code: Select all

Executed instructions:
 44.83%   47.03%  25026563  flat_generate_mips
 17.62%   17.65%   9833640  flat_remap_mips
  9.90%            5526118  render_patch_direct
  5.46%    5.48%   3050443  strcmp_8
  4.91%    4.93%   2743296  correct_element
  3.35%    3.87%   1869338  build_directory_hash
  2.47%    2.48%   1376259  create_quick_alpha
  2.12%    2.13%   1185037  mip_plot_16bit
...
Instruction cache misses:
 53.89%   70.34%   1736922  flat_generate_mips
 15.76%   15.80%    507944  mip_plot_16bit
  8.13%    8.38%    261972  flat_remap_mips
  6.07%    9.66%    195654  locate_entry_q
  5.51%             177616  ROM_TOS
  3.71%    3.82%    119487  strcmp_8
Rendering 8 frames, CPU side

Doom1, 1.15s:

Code: Select all

Executed instructions:
 44.01%             881624  render_wall_1x1
 30.24%             605812  render_flats_1x1
  5.16%    5.18%    103360  stream_texture
  4.04%    4.08%     81022  stack_visplane_area
  3.16%              63332  dividing_node
  2.14%    2.14%     42788  add_partition_segment
  1.94%              38818  segment_loop
...
Instruction cache misses:
 13.04%              11316  segment_loop
 12.48%              10829  render_wall_1x1
  8.56%    8.62%      7432  process_lighting
  4.67%               4051  dividing_node
  4.31%               3744  build_ssector
  4.20%               3648  memory_handle
  3.80%   10.42%      3294  visplane_tryflush
  3.59%   28.28%      3120  add_wall_segment
  3.24%               2816  seg_prelight_done
  2.93%               2541  render_flats_1x1
  2.81%    3.14%      2441  stack_visplane_area
  2.80%   17.03%      2428  render_wall
Doom2, 1.42s:

Code: Select all

Executed instructions:
 72.85%            2187804  render_wall_1x1
  9.46%             284228  render_flats_1x1
  3.44%    3.46%    103360  stream_texture
  2.93%    2.95%     87976  stack_visplane_area
  2.78%              83582  dividing_node
  1.37%              41096  segment_loop
...
Instruction cache misses:
 16.21%              15737  render_wall_1x1
 12.58%              12211  segment_loop
  5.98%    5.99%      5808  process_lighting
  4.73%               4592  memory_handle
  4.05%   34.64%      3936  add_wall_segment
  3.74%               3628  dividing_node
  3.72%               3608  seg_prelight_done
  3.14%   22.13%      3050  render_wall
Rendering frames, DSP side

Doom1:

Code: Select all

Used cycles:
 37.88%           14009246  VPRenderPlaneDT_
 24.67%   24.73%   9122644  perspected_column
 10.03%            3707982  command_base
  8.47%            3130786  SetTexture
  4.27%    4.27%   1578496  extract_subvisplane
  2.96%            1093216  AddLowerWall
  2.67%             987888  AddUpperWall
  1.80%             664928  AddMidWall
  1.17%             433504  project_node
  0.94%             347408  NodeInCone
  0.91%             338014  R_ViewTestBufferSeg
Doom2:

Code: Select all

Used cycles:
 53.62%   53.68%  24362428  perspected_column
 13.55%            6155958  VPRenderPlaneDT_
 10.09%            4586496  command_base
  6.89%            3130834  SetTexture
  3.81%    3.81%   1729756  extract_subvisplane
  2.24%            1017024  AddLowerWall
  1.97%             894368  AddUpperWall
  1.52%             690128  AddMidWall
  0.82%             373060  R_ViewTestBufferSeg
  0.78%             354880  project_node
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

The script for doing the profiling, post-processing the data and generating the graphs is attached. It requires latest Hatari from Mercurial, BM, its CPU & DSP symbols and Doom WAD.

I've attached graphs from run with Doom2 WAD. One for DSP side doing 8 frames, and graphs for CPU side of startup and doing 8 frames.

CPU side graphs have heavier filtering applied to them than for the DSP side because CPU side callgraph would be otherwise quite a bit more complex. With the latest Hatari caller info collection fixes I think the graphs look pretty good. :-)
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Thanks, that's quite interesting already.

The cache miss information does in fact correlate with both my expectations and where I had been focusing for a while - shrinking code in those areas to cache better on the CPU side. And the impact from those changes is measureable.

e.g. a lot of old CPU code was collapsed into R_ViewTestBufferSeg, which takes a tiny amount of time now on the DSP (cool!). But it's also interesting that perspected_column and VPRenderPlaneDT can exchange places so radically...

While looking at this stuff its important to bear in mind that while an operation may dominate, it doesn't necessarily mean a bottleneck. It's quite complicated to profile concurrent processors - I've been focusing on the host port 'spin' loops on both sides to find the blockages as it's the only thing that does not mislead.

Still, the more dominant each bit of code, the more likely it will result in a blockage at some time, even briefly during a frame. So its helpful to see.

It looks like the cache miss information is broadly correct (even if we're not sure how precisely correct it is for individual instructions or small spans of instructions). So this gives me more confidence that it's usable and to pay attention to it. I have been pretty wary of it so far, being relatively new and some of the individual readings are hard to make sense of!

So this is good stuff.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:While looking at this stuff its important to bear in mind that while an operation may dominate, it doesn't necessarily mean a bottleneck. It's quite complicated to profile concurrent processors - I've been focusing on the host port 'spin' loops on both sides to find the blockages as it's the only thing that does not mislead.
Spinloop also consumes instructions, so if it's a bottleneck, it should be at least clearly visible in the profile, shouldn't it?

If these spin/wait points are embedded into larger functions, it may help if you have separate labels before and after the spin points, and they clearly name what is being waited at that point. :-)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Spinloop also consumes instructions, so if it's a bottleneck, it should be at least clearly visible in the profile, shouldn't it?
Indeed, that's what I've been using it for :)
Eero Tamminen wrote: If these spin/wait points are embedded into larger functions, it may help if you have separate labels before and after the spin points, and they clearly name what is being waited at that point. :-)
I do have another tool which finds/highlights them automatically (spreadsheet checks ratio of spin path vs exit path encounters) but yes labels could be used to help find them by other means or by eye.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Some progress recently, but a cold has slowed me down a bit.

I could use some help soon, so if anyone has the time... I need to locate a recorded 'demo' loop which is compatible with linuxdoom1.10 (the public source). This allows various things to be done inside the Doom code before it becomes playable. There seem to be many annoying issues with recorded demo file compatibility and Doom code versions, WAD types & versions so I didn't get very far with my initial search for files.

If a usable demo file can be found or created that would be very helpful (sorry, I don't have time to try a big selection of them at random - one that is known to work will be most helpful, and enough).

Note: Any recorded demo file probably needs to match a specific WAD so I'd need that information along with the file.

(I'll make my own if I have to but it eats into the limited time I have on this project).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Current status with BM... in the 'builtin profiler' screengrab below, items marked with a green star are nearing their limits for optimization using current approach. They aren't expected to get much faster, radical changes excluded. Yellow star = might get a bit faster still. Orange = still room left. Red = lots of room left, next thing needing attention.

The other screengrab shows metrics for the same scene - wall, floor span, wall column count & BSP walk complexity. This is an expensive scene (one that used to upset my old 486DX) and I turned wall pixel plotting off so the wall work is all DSP limited to get a better idea for limits.

I have also started working on the game code and refactoring BM to fit with that.
lastprf.png
lastmet.png
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:I have also started working on the game code and refactoring BM to fit with that.
Sounds great! Whenever you have something runnable, I would be interested to profile it. :-)

Are you now compiling the 68k assembly with Vasm and building the rest with gcc?

For CPU code we could switch into using DRI/GST symbols in the binary itself, instead of exporting & importing the symbols separately to Hatari debugger. I guess Vasm etc can still output a.out objects, GCC just needs to be told to link the final result into "traditional format" (haven't tried that myself yet though).
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

Eero Tamminen wrote:Yes, the difference between Doom1 and Doom2 WAD costs is pretty radical.
I forgot to mention that for some reason Doom1 shows up in BM in some "letterbox" display format (see the attached screenshot), unlike Doom2 WAD which showed up normally. That might explain some of the differences...
You do not have the required permissions to view the files attached to this post.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Sounds great! Whenever you have something runnable, I would be interested to profile it. :-)
There are quite a few stages involved - most of it getting the asm project reorganised for linking to C without all the 'WAD viewer' bits attached, fixing all the interdependencies. But once I have them joined, linking and running I'll let you know.
Eero Tamminen wrote: Are you now compiling the 68k assembly with Vasm and building the rest with gcc?
I've started reorganising for it, but not there yet. Did some experiments with the scrolling STE demo to make sure it would work. I also have a strange problem loading LOD files directly at runtime which is preventing it working fully outside Hatari+Devpac (LOD2BIN is a TTP and must be run inside Hatari on each build of the DSP code :-z ). Probably just some XBios/DSP call ordering dependency I need to refresh my memory on - it works sometimes but not others.

I think there may be a problem with my DSP library, as it was written when the Falcon was really new and the docs were still based on guidelines for Sparrow/Falcon compatibility, but looking at recent documentation for Falcon xbios it looks different (absolute trap numbers, not based off a base index query). This may be the cause of my problems with it.
Eero Tamminen wrote: For CPU code we could switch into using DRI/GST symbols in the binary itself, instead of exporting & importing the symbols separately to Hatari debugger. I guess Vasm etc can still output a.out objects, GCC just needs to be told to link the final result into "traditional format" (haven't tried that myself yet though).
I found 'symbols prg' worked great when I last tried it the other day, using the new sources (started using the new profiler but not tried the postprocess/callgraph bits yet).

Things will all be easier once it's reorganised for gcc+vasm.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: I forgot to mention that for some reason Doom1 shows up in BM in some "letterbox" display format (see the attached screenshot), unlike Doom2 WAD which showed up normally. That might explain some of the differences...
In fact it's just flat-shading the floor, ceiling in black because the VP shader is disabled in that build. It's working ok.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Probably just some XBios/DSP call ordering dependency I need to refresh my memory on - it works sometimes but not others.
Hatari can give a trace of those calls with:

Code: Select all

--bios-intercept --trace xbios
Hatari should show arguments for almost all XBios calls now, but if something is missing or you need more info to traces, just mail hatari-devel. And for DSP there are of course additional tracing options.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

I can probably live without a linuxdoom1.10 demo loop file now - fixed the code to run with the demo loops inside the commercial WAD files. This grab is from the Doom II 'clean port' running it's demo loop in Hatari.
d2loop.png
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote: fixed the code to run with the demo loops inside the commercial WAD files. This grab is from the Doom II 'clean port' running it's demo loop in Hatari.
Does "clean port" mean that it doesn't yet have the BM stuff integrated into it? If yes, I wonder why you're getting frame skips (FS=1) with plain CPU code...

PS. You're quite naughty in posting these teaser pics, when you know how eager people are to test new versions (and in my case, profile them). ;)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Does "clean port" mean that it doesn't yet have the BM stuff integrated into it?
Yes it's just the original Doom source hacked up and built for TOS, but with some changes to allow it to run the attract mode and display graphics (which is important, since I have no keyboard input yet and it's the only other way to prove the game code is running properly on m68k - while it is portable, it is also full of thorns).
Eero Tamminen wrote: If yes, I wonder why you're getting frame skips (FS=1) with plain CPU code...
Under what circumstances should we see frame skips?
Eero Tamminen wrote: PS. You're quite naughty in posting these teaser pics, when you know how eager people are to test new versions (and in my case, profile them). ;)
:angel:


I will be clear about it when I get the two units joined up. Currently still building a makefile and splitting up existing source files for the new version of the engine.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:Yes it's just the original Doom source hacked up and built for TOS, but with some changes to allow it to run the attract mode and display graphics (which is important, since I have no keyboard input yet and it's the only other way to prove the game code is running properly on m68k - while it is portable, it is also full of thorns).
Hm. If you've had time to get rid of the gettimeofday() stuff featuring largely in the first linuxdoom TOS profile, profiling the CPU only demo mode might also be interesting to see whether there's something new visible, e.g. for code adding player, gun & enemy sprites. Providing debug symbols requires just the right gcc linker flag so getting them is now easy to automate. :-)
dml wrote:
Eero Tamminen wrote: If yes, I wonder why you're getting frame skips (FS=1) with plain CPU code...
Under what circumstances should we see frame skips?
Only when your PC is too slow to run emulation at full speed (if you would use fast-forward, it would be at maximum frame skip, which is by default 5, not at FS=1).

If the DSP side is idling, 2GHz machine should be well enough, but with plain CPU code, you can also disable DSP completely ("--dsp none") and have things running faster, especially with fast-forward mode ("--fast-forward yes", with which BM and Doom should work fine). Using "old UAE" CPU core build of Hatari (with which BM & Doom work fine too) could also give a little more speed with fast-forward.

Profiling adds some overhead, especially when one adds symbols before profiling, so that it collects caller information. Was that screenshot with profiling enabled?
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: Hm. If you've had time to get rid of the gettimeofday() stuff featuring largely in the first linuxdoom TOS profile, profiling the CPU only demo mode might also be interesting to see whether there's something new visible, e.g. for code adding player, gun & enemy sprites.
I'll look at this later and see what's going on. Probably being used as a portable fallback for realtime measurement, and is supposed to be replaced.
Eero Tamminen wrote: Providing debug symbols requires just the right gcc linker flag so getting them is now easy to automate. :-)
I did try "-Wl,--traditional-format", followed by 'symbols prg TEXT DATA BSS' in the debugger, and that appeared to work.

However I have noticed that very few symbols from the program are visible in the debugger. I'm still looking into this.
Eero Tamminen wrote: Only when your PC is too slow to run emulation at full speed (if you would use fast-forward, it would be at maximum frame skip, which is by default 5, not at FS=1).
It could be a range of things - DSP emulation was on, this laptop isn't incredibly fast, and I was using a remote login from another machine at the time from elsewhere :-)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:I did try "-Wl,--traditional-format", followed by 'symbols prg TEXT DATA BSS' in the debugger, and that appeared to work.

However I have noticed that very few symbols from the program are visible in the debugger. I'm still looking into this.
What if you also build the code with "-g"? Probably without that you get only global symbols.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Bad Mood : Falcon030 'Doom'

Post by dml »

Eero Tamminen wrote: What if you also build the code with "-g"? Probably without that you get only global symbols.
In fact I get thousands of .Lxxxx local symbols and some global symbols even without -g. With -g I get an extra 500k on the executable and I don't see any obvious difference in visible debugger symbols.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Bad Mood : Falcon030 'Doom'

Post by Eero Tamminen »

dml wrote:In fact I get thousands of .Lxxxx local symbols and some global symbols even without -g. With -g I get an extra 500k on the executable and I don't see any obvious difference in visible debugger symbols.
Could you mail the binary to me with information about your compiler version and example(s) of some symbols that should and should not be visible?

I can then investigate it a bit and compare that to what I'll get out of my own programs (compiled with native GCC v2.95 from Sparemint).

I've just updated Hatari manual's debugger section (for new debug symbol handling, breakpoint stuff and profiler), so it would be good to get it right...

Return to “680x0”