DOOM on atari st

Game requests go here.

Moderators: ICS, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

GokMasE wrote:Out of curiosity, how many FPS could the latest test binary manage in average?
(Unless I am mistaken that ought to be the first one with alternating roof height, and wall pieces bobbing up and down)

I figure it could be interesting to get a rough idea of what to expect from the upcoming one ;)
I think the last one I actually posted was STray5.zip. It averaged about 9fps.

However, there are some important differences. Stray5 was a 16-colour version (with coarse ditherering) versus the dual-field, 50-60 colour versions I have been measuring lately. It also used a much simpler map with very few height changes (also, no more than one per ray at any time whereas the current version has a proper staircase with ~5 transitions per ray), short view distance, lower FOV (flattened projection - most of the pixels drawn are wall texture, which is at least 1.5x faster than floor texture) and just one texture applied everywhere.

So it probably serves as a decent comparison for fps, but slower code, simpler data, doing less work overall.

^^^

I think the fps is skewed a bit higher also by the fact Stray5 spends some time 'inside walls' and rendering no pixels, or rendering half of a wall etc. This causes the fps to briefly spike in terms of seconds but many rendering frames are involved in those periods and it drags the average fps up quite a bit.

Recent versions have a kind of collision detection to keep the camera out of walls, so the average is more honest.
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 3362
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: DOOM on atari st

Post by Cyprian »

Hey Doug, may I ask you to release source code for current state as you did with STray01.zip?
It would be really useful for learning how to mix C and Asm code, how to optimise ect.
thanks
ATW800/2 / V4sa / Lynx I / Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
http://260ste.atari.org
User avatar
GokMasE
Captain Atari
Captain Atari
Posts: 323
Joined: Sun Mar 02, 2003 11:16 pm
Location: Sweden

Re: DOOM on atari st

Post by GokMasE »

dml wrote:I think the fps is skewed a bit higher also by the fact Stray5 spends some time 'inside walls' and rendering no pixels, or rendering half of a wall etc. This causes the fps to briefly spike in terms of seconds but many rendering frames are involved in those periods and it drags the average fps up quite a bit.

Recent versions have a kind of collision detection to keep the camera out of walls, so the average is more honest.
As yes, I noticed the camera wasn't limiting its movements to the free space in the 3d world ;)

Knowing the current scheme is delivering more colours moving through a more complex and demanding map while managing to (probably) slightly exceed the speed of the previous build, I'd say this side project has really started to become a main act :-D

Surely, there must be a bunch of game coders waiting to take on the challange to do something really creative with this engine once it is finished/released? I mean, seriously?
Combining both the cryptochrome technique and the textured 3d rendering of a map with multiple heights in realtime.. ..this has literally moved the limits of what an ST can do :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Cyprian wrote:Hey Doug, may I ask you to release source code for current state as you did with STray01.zip?
It would be really useful for learning how to mix C and Asm code, how to optimise ect.
thanks
Yes I can make a snapshot available before I do more with it.

In fact much of the code has now been moved to .s assembler files using VASM but the mixed C/asm versions of everything is still there and still works if the right makefile #defines are used. The main program is still C.


There are some reasonably useful guides on inline assembly with GCC, here are a few...

http://bus-error.nokturnal.pl/tiki-read ... rticleId=2
http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf
http://www.ibiblio.org/gferg/ldp/GCC-In ... HOWTO.html
http://asm.sourceforge.net/articles/rmi ... ne-asm.txt
http://www.ethernut.de/en/documents/arm-inline-asm.html


...but I think none of them really cover all of the features - and for some of the more advanced cases the examples can be insufficient. I haven't found a really good single reference which covers everything clearly - e.g. how to safely modify a C register variable in an asm block. And nasty gotchas - such as the compiler switching between two different asm syntaxes depending on the presence of ": : :" register lists at the end (no register lists = old syntax, generally incompatible with new syntax), or just rejecting blocks with empty register lists as bad syntax (solve this by adding a fake "cc" condition code clobber).

Most of the tutorials are x86 too, but the general principles are not platform specific.
Last edited by dml on Thu Oct 24, 2013 10:43 am, edited 1 time in total.
User avatar
yerzmyey
Atari Super Hero
Atari Super Hero
Posts: 602
Joined: Fri Sep 19, 2008 12:23 pm

Re: DOOM on atari st

Post by yerzmyey »

Like Mr Spock used to say - fascinating. ;)
http://ym-digital.i-demo.pl/ ATARI 520ST music-band
http://ay-riders.speccy.cz/ ZX Spectrum music-band
http://yerzmyey.i-demo.pl/ ZX/A500/A1200/ST/XL music
https://soundcloud.com/yerzmyey ZX/A500/A1200/ST/STE/F030 music
http://z80.i-demo.pl/ MP3 archive of Z80 chip music
No good deed will escape unpunished.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

GokMasE wrote: As yes, I noticed the camera wasn't limiting its movements to the free space in the 3d world ;)
It is just a bounce-off-solid-stuff hack but it's enough for viewing the map.
GokMasE wrote: Knowing the current scheme is delivering more colours moving through a more complex and demanding map while managing to (probably) slightly exceed the speed of the previous build, I'd say this side project has really started to become a main act :-D
:) You can blame Eero for talking me into drawing walls! The original experiment was the floors. But I now have a few extra tricks to use with rays so it's a win.
GokMasE wrote: Surely, there must be a bunch of game coders waiting to take on the challange to do something really creative with this engine once it is finished/released? I mean, seriously?
Combining both the cryptochrome technique and the textured 3d rendering of a map with multiple heights in realtime.. ..this has literally moved the limits of what an ST can do :)
I think a lot of ST coders are mainly interested in doing their own experiments and moving on (like me probably). But there are some out there who are more interested in putting good gameplay together and don't have time for the details. So maybe it's a fit for that.

I'm interested in working on a game but it's not a quick project. Tech can be quick to develop from scratch into a basic demo but building a game takes serious time. A big chunk of the work is invisible too so commitment is a big part of it. Anyone who's tried it will know what I mean :)
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 3362
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: DOOM on atari st

Post by Cyprian »

dml wrote:Yes I can make a snapshot available before I do more with it.

In fact much of the code has now been moved to .s assembler files using VASM but the mixed C/asm versions of everything is still there and still works if the right makefile #defines are used. The main program is still C.


There are some reasonably useful guides on inline assembly with GCC, here are a few...
you right, there are a lot of examples over the internet,
but here thanks to you (and Vincent for his great cross-mint-cygwin tool) we have a working (and compilable) example dedicated for Atari in one ZIP file :)
ATW800/2 / V4sa / Lynx I / Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
http://260ste.atari.org
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Cyprian wrote:(and Vincent for his great cross-mint-cygwin tool) we have a working (and compilable) example dedicated for Atari in one ZIP file :)
Yes without Vincent's cross-compiler port and good emulators all of this would be more painful for sure :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Thought I'd mention one of the tricks I use to get C converted into assembly quickly.

It can be painful to do this from scratch by hand if the function is big because there's usually a lot of table indexing and pointer magic which is easy to express in C but turns into a mess of scaling and indirection in assembler and can take a lot of time to get right - especially true when the routine won't work properly at all until it is finished. It's annoying when the code you spend most time on is not the code you want to focus on for optimization too.


So a shortcut is to take the working C function and hollow out the 'interesting' core (e.g. drawing pixels or whatever), leaving just the basic skeleton with all the messy outer parts. Replace the core with a printf message (or equivalent) which forces the compiler to calculate all the outputs necessary for the core to work (i.e. a dummy core which pretends to do something). You then run the routine through the emulator/profiler and capture the disassembly for that. Selecting different levels of compiler optimization will make the resulting code more or less easy to follow.

Note: The bigger and more complex the code you want to capture, the lower you probably want to set the optimization level until you've mined and named all the variables etc.

The disassembly you get isn't optimal for the job but it can be a good guide since removing the core frees up lots of registers. The important thing is that it's in assembler form, and it's correct. This leaves you with working first cut to build out from. You can then implement the core by hand, and rework the remaining captured code to suit. The painful step of building up a working skeleton has been sidestepped.


This approach isn't always better than doing it by hand. Especially if you're not converting from C in the first place :) It helps more when the core is quite large and adds to confusion, and where the core doesn't create critical data for other parts of the program. But it tends to make more complex conversions take less time where the core can be mocked up in a trivial way.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: DOOM on atari st

Post by Eero Tamminen »

I added to Doug's HG repo a script that profiles and post-processes the execution with Hatari.

This is from GCC 2.x build where I needed to disable some things and symbols can be missing, so it's just as an example. There were only couple of symbols that are visited while the demo is running, from those I chose "init_visplanes" as one I use to indicate frame change.

Where time might go in startup:

Code: Select all

Time spent in profile = 185.46300s.
...
Visits/calls:
  10.47%  10.47%      165453    165453   ___cmpdf2
   7.27%              114961             ___norm_df
   6.90%   6.90%      108997    108997   ___mulsf3
   6.11%   6.11%       96539     96539   ___extendsfdf2
   5.88%  11.76%       92968    185874   ___ltdf2
   5.45%   5.45%       86109     86109   ___fixdfsi
   5.45%   5.45%       86058     86058   _modf
   5.44%  22.21%       86030    351015   _floor
   4.74%   4.74%       74849     74849   ___divsf3
   4.15%   8.29%       65568    131091   ___gedf2
   4.15%  16.59%       65556    262127   ___fixunsdfsi
...
Used cycles:
  14.23%  17.59%  17.59%   211636344 261735768 261735768   ___mulsf3
  13.47%  12.13%  12.13%   200431004 180421764 180421764 * ___divsf3
   7.24%   7.93%   7.93%   107730176 118023920 118023920   ___muldf3
   7.02%   7.18%   7.18%   104503180 106774796 106774796   _duplicate_column__FRC7fcolumnR7
   6.21%   5.59%   5.59%    92421104  83088912  83088912 * ___cmpdf2
   6.17%   6.11%   6.11%    91859888  90925024  90925024 * ___divdf3
   5.51%   0.00%   0.00%    81968664      1224      1224 * ___static_initialization_and_des
   4.75%                    70614352                       ___norm_df
   3.49%   3.07%   3.07%    51919704  45698568  45698568 * ___adddf3
   3.21%                    47818844                       ROM_TOS
   2.41%   2.46%   2.46%    35841252  36612936  36612936   ___fixdfsi
   2.37%   2.42%   2.42%    35214064  35963428  35963428   ___extendsfdf2
   2.01%   2.05%   2.05%    29863856  30570080  30570080   ___floatsidf
   1.99%   2.03%   2.03%    29554748  30169084  30169084   _generate68k_column__FPUsPsiii
   1.95%   2.01%  81.11%    29064064  299083601206637636   _onetime_init_raycasting
   1.85%   6.11%   6.11%    27543356  90859688  90859688   _modf
   1.71%   0.16%   0.16%    25479276   2386172   2386172 * ___cmpsf2
   1.67%   1.71%  12.72%    24795868  25402120 189269468   _floor
   1.51%   1.55%   1.55%    22501888  23008908  23008908   ___udivsi3
   1.50%   1.53%   1.53%    22247020  22698540  22698540   ___truncdfsf2
   1.18%   1.21%   5.90%    17610920  18010496  87706392   ___fixunsdfsi
   1.03%   1.05%   4.32%    15279316  15609084  64242884   ___ltdf2
   0.93%   0.88%   0.88%    13777324  13075484  13075484 * ___addsf3
   0.74%   0.76%   5.86%    10966452  11248300  87226600   _onetime_init_tables__FPc
   0.72%   0.74%   2.82%    10775252  11017984  41891656   ___gedf2
   0.69%   0.71%  16.96%    10320972  10582316 252365396   _sqrt
Worst frame based on used CPU instructions (not cycles):

Code: Select all

Time spent in profile = 0.12890s.
...

Visits/calls:
- max = 6, in _shifter_vbl_asm2 at 0x1a25e, on line 1675
- 21 in total
Executed instructions:
- max = 311, in _render_drawplanes_68k+670 at 0x1928a, on line 830
- 79509 in total
Used cycles:
- max = 7140, in _render_drawplanes_68k+542 at 0x1920a, on line 783
- 1033968 in total

Visits/calls:
  19.05%  19.05%           4         4   allocate_visplane
   4.76%  23.81%           1         5   _project_walls_68k
...
Executed instructions:
  23.38%  23.45%  23.54%       18593     18643     18719   _project_walls_68k
  20.60%  20.63%  20.63%       16380     16405     16405   _render_drawplanes_68k
  13.35%  13.35%  13.35%       10612     10612     10612   _raycast_world_68kv2
  13.23%  13.30%  13.30%       10521     10571     10571   _c2pzoom_96_ccpairsq_dualfield
  13.01%  16.18%  16.18%       10348     12862     12862   _render_columns
  12.62%  12.66%  12.66%       10038     10063     10063   _scanconvert_vpgroup_68k
   3.16%                        2514                       _etext

Used cycles:
  26.59%  26.74%  26.74%      274956    276460    276460   _c2pzoom_96_ccpairsq_dualfield
  19.74%  19.81%  19.81%      204084    204836    204836   _render_drawplanes_68k
  18.26%  18.41%  18.52%      188848    190352    191472   _project_walls_68k
  12.67%  12.67%  12.67%      131008    131008    131008   _raycast_world_68kv2
  10.08%  13.77%  13.77%      104260    142420    142420   _render_columns
   8.07%   8.14%   8.14%       83440     84192     84192   _scanconvert_vpgroup_68k
   3.69%                       38160                       _etext
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Eero Tamminen wrote:I added to Doug's HG repo a script that profiles and post-processes the execution with Hatari.

This is from GCC 2.x build where I needed to disable some things and symbols can be missing, so it's just as an example. There were only couple of symbols that are visited while the demo is running, from those I chose "init_visplanes" as one I use to indicate frame change.
That's interesting. So the floor drawing gets expensive in the worst bits - probably the staircase edges. I'll look at it again but there might not be a lot of room for speedups, except fill-area problems and the area is currently small.

Still, the profile doesn't change that much. It's pretty flat for a worst frame. I'm reasonably happy with that - just need to be careful with choices in map design.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Some coding notes today.

Just quickly tested a (probably) better way to get subtexel accurate texturing with generated code spans. Idea seems to work, but no drawing routines written yet. Just playing with the generator.


The idea is to allow the codegen pass to search for 'approximate hits' for a given number of subtexel offset fraction bits, while generating the code for the span. It may find decent approximations at random points along the span, depending on the rate of the span. These get logged as they are found in an indexing table.

For spans where a good approximation can't be found for all offsets within the length of the span, a unique span can be generated at small extra cost. This turns out to be very rare. For cases where hits are found within the span length, the span length is adjusted upwards by the start position of the last hit (so all N offsets can complete a span as before).

The key to making it work is allowing a flexible amount of error which is rate-dependent, plus recording the amount of offset compensation needed to begin rendering from a given offset into the generated code. You only need to store a small table for this because there are a limited number of offsets being searched (currently 4, but maybe increased to 16 when it's working).

All spans need to generate more code to do this, but offsets are commonly found early.

I'm trying to calculate the cost of storing the extra generated code - but given that the majority of span rates can fill all 4 subtexel slots within the first 20 pixels, the average cost is something like 15-20% more than without subtexel... (versus 400% for the simpler solution - generating unique code for every rate AND subtexel offset).

You can save some more space by allowing -ve as well as +ve approximations but this adds some complications - overrunning the start of the texture row. This is bad if the textures are not padded and your c2p routine accepts only pre-shifted pixels, as mine does (crash!).

Overall, using this should (A) speed up wall column drawing because only one call is needed per column to achieve subtexel (instead of splitting the column at the horizon / zero fraction)... and (B) takes less space than generating spans in both directions (upper/lower spans). It is more fiddly to dispatch a call though, and requires storing the subtexel fraction in the column itself, which I previously managed to avoid.

So that's it. Somebody might find this route interesting to play with.

[EDIT]

Some output from the generator, showing texture U stepping rate for each generated span, and the 4 offsets which produce approximate hits for 0.25 texel offsets. The vast majority of stepping rates get solved within about 12 steps, which costs basically nothing. The occasional spike occurs, but doesn't really impact storage. The error tolerance is 1/4 the texel rate, which seems to be good enough in most cases but could probably be reduced.

Code: Select all

rate: [00001780]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00004680] i:3 c:3
subtexel [00008000] found >> match: a:[00007580] i:5 c:5
subtexel [0000c000] found >> match: a:[0000bc00] i:8 c:8
rate: [00001800]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00004800] i:3 c:3
subtexel [00008000] found >> match: a:[00007800] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c000] i:8 c:8
rate: [00001880]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003100] i:2 c:2
subtexel [00008000] found >> match: a:[00007a80] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c400] i:8 c:8
rate: [00001900]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003200] i:2 c:2
subtexel [00008000] found >> match: a:[00007d00] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c800] i:8 c:8
rate: [00001980]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003300] i:2 c:2
subtexel [00008000] found >> match: a:[00007f80] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b280] i:7 c:7
rate: [00001a00]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003400] i:2 c:2
subtexel [00008000] found >> match: a:[00008200] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b600] i:7 c:7
rate: [00001a80]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003500] i:2 c:2
subtexel [00008000] found >> match: a:[00008480] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b980] i:7 c:7
a: is the texture U accumulator at that pixel offset
i: is the pixel offset
c: is the amount of offset compensation needed (currently always == offset, because only +ve approximations have been allowed in this run)

I already manage to keep the number of generated rates really low, and precision really high via mipmapping. The texel rate (for the chosen mipbias) is always bounded in the range between 0.5 and 2.0 so nearly all of the generated rates are spent on precision. That's a triple-win. ;-)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: DOOM on atari st

Post by Eero Tamminen »

dml wrote:That's interesting. So the floor drawing gets expensive in the worst bits - probably the staircase edges. I'll look at it again but there might not be a lot of room for speedups, except fill-area problems and the area is currently small.
Remember that for GCC 2.x build I had to use the plain C-version for couple of functions and it otherwise doesn't optimize as well as GCC 4.x. I also didn't check whether symbols contained extra instructions (i.e. was there code between symbols in the profile disassembly that looked like it should have subroutine symbols, but hadn't).

And it detects worst frame by number of executed instructions, not by used cycles. From profile I posted, you can see that there's quite a large difference in used cycles & instructions for subroutines. :-)

It's better if you can run the profile/profile.sh script yourself and check the output directly.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Eero Tamminen wrote: Remember that for GCC 2.x build I had to use the plain C-version for couple of functions and it otherwise doesn't optimize as well as GCC 4.x. I also didn't check whether symbols contained extra instructions (i.e. was there code between symbols in the profile disassembly that looked like it should have subroutine symbols, but hadn't).
In fact if you're building with the USE_VASM_OPTI define (which I expect you must be) then most of that other C code is disabled/replaced anyway.

I just checked in a few fixes which allow the original C version to work - it had got broken.


There is a big chunk of generated code which does confuse profiling and all of those cycles collect under the next available item - sometimes PROGRAM_TEXT, sometimes ROM, sometimes the VBL code. I have got used to it by now - it's easy to spot anyway as it starts at a high address, non-contiguous with all the other code. It also has no labels.
Eero Tamminen wrote: And it detects worst frame by number of executed instructions, not by used cycles. From profile I posted, you can see that there's quite a large difference in used cycles & instructions for subroutines. :-)
That's a good point, although it's still reasonable guide.

The strange divergence in cycles/ops is a result of much hand-optimization :-)
Eero Tamminen wrote: It's better if you can run the profile/profile.sh script yourself and check the output directly.
Ok. It looks reasonable to me but I'll report if I find anything really strange.

BTW the new makefile still works for me.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

dml wrote: Just quickly tested a (probably) better way to get subtexel accurate texturing with generated code spans. Idea seems to work, but no drawing routines written yet. Just playing with the generator.
I did get this working over the weekend, although it came with a much bigger bag of problems than I expected. The problems did get solved and the C[+codegen] version was running ok. It looks no worse than before, and should be simpler to convert the last bits to asm and to add new features (like texture pegging/offsetting/scrolling), even if the code generator is much more complicated than it was...

Got a cold now, so probably won't do much in the evenings until at least next week.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

dml wrote: It currently runs at 8.7fps at 96x60 pixels (plus x2 zoom) in dual-field mode. That means the final framerate could be somewhere around 9.0 - 9.3fps with the same config.
With all the last routine now converted into 68k it measures 9.24fps in dual-field mode - towards the higher predicted figure.

With single-field coarse dither it measures 10.4fps which is also not very far from the 11fps I was looking for.

There are still lots of mid-to-small optimizations which can be done all over the place but I'm going to move on from here and hopefully make a more interesting demo out of it.

As an added bonus I managed to do the subtexel stuff properly without splitting wall columns in the middle, and saved some RAM in the process. Texture pegging/height adjustment should now also work - although there's no info in the map yet for that and it really would demand a map editor to apply that level of detail sensibly.
User avatar
bullis1
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2301
Joined: Tue Dec 12, 2006 2:32 pm
Location: Canada

Re: DOOM on atari st

Post by bullis1 »

The fps gap between single and dual-field has really narrowed. Great work.

A map editor would be really neat but I guess it's only useful to someone making a game out of it. A 2D grid for layout with a 3D mode for selecting and aligning textures and adjusting ceiling/floor heights would be best (just like BUILD or modern DooM editors).

EDIT: Also what is RAM usage looking like nowadays?
Member of the Atari Legend team
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

bullis1 wrote:The fps gap between single and dual-field has really narrowed. Great work.

A map editor would be really neat but I guess it's only useful to someone making a game out of it. A 2D grid for layout with a 3D mode for selecting and aligning textures and adjusting ceiling/floor heights would be best (just like BUILD or modern DooM editors).
Map editor would be nice but yes, bit of a luxury :) fortunately it's a simple format so one could be bashed together in some high level language or other if needed.
bullis1 wrote: EDIT: Also what is RAM usage looking like nowadays?
It's still pretty enormous but there is waste I need to clean up. Have started getting numbers from the different areas. The dual-field c2p system does cost a lot - big tables and 4 framebuffers. The tables can be made smaller but it slows down the c2p (which is already a bottleneck - there's actually more room to slow down the drawing if anything).

The good news is that the big footprint doesn't scale with the map and wall textures. It's pretty constant and the leveldata is relatively cheap. So if the constant parts can be optimized down a bit, there should be room for other stuff.

The primary 'hog' is the floor which takes a lot per individual texture but there are ways to shrink that which I haven't played with yet. Obvious ways to reduce footprint include cutting the texture resolution in half (32 vs 64 wide) and using rotation-invariant textures, reducing the number of precalc angles by 2 or 4. Those two things are easy and would push the floor texture cost into the background. For the kind of muted floor textures used in a game at this resolution, I'm not sure the drop from 64 to 32 would actually be noticed either.

There are other methods which I have considered but are more complicated to code (like burning the tiling into codegen - which combined with the other stuff I'm doing makes it extra hard!).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

The static part measures 1.24mb of RAM including all code, tables and buffers, except for the codegen buffer which gets created at runtime. There are some other temporary buffers but they don't need counted in the total.

At least half of the code is waste because there are 5 or 10 different unrolled c2p routines and a big chunk of C library nonsense including a float math lib used during startup for table gen. So while the TOS file is 200k, the actual engine code is quite small. The symbol table indicates the interesting bit is under 40k of code.

The codegen buffer is currently 320k (and increases nearly proportional to window height). So that's about 1.6mb in total without any graphics loaded.

The precalc floor texture used in the test is 1.15mb (!) and each wall texture is 20k. Obviously any time spent optimizing RAM should be spent on the floor texture technique. Current'y the cost is disproportional to its value onscreen, in context with everything else.

So the running total is around 2.75mb 'accounted for' (EDIT: make that 3mb including wall textures). Some of which is waste or can be reduced. I don't know if this is the whole story yet but it does seem to cover everything important. If it uses more than that it's probably a mistake and can be fixed.
User avatar
bullis1
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2301
Joined: Tue Dec 12, 2006 2:32 pm
Location: Canada

Re: DOOM on atari st

Post by bullis1 »

3MB is not that bad! It doesn't leave much room for expansion however so I can see the need to shrink it.

And yes, I doubt the drop to 32 colour floors would be much of a sacrifice (visually) at this resolution and viewing angle.

Is it possible to use a mixture of efficient rotationally symmetrical floor textures and typical asymmetrical textures in the same map? I suppose this hasn't been added yet but it seems like the engine could handle it judging by your previous explanations of how the floor tables/drawing works.
Member of the Atari Legend team
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

bullis1 wrote: Is it possible to use a mixture of efficient rotationally symmetrical floor textures and typical asymmetrical textures in the same map? I suppose this hasn't been added yet but it seems like the engine could handle it judging by your previous explanations of how the floor tables/drawing works.
Yes, good point. In fact the angle mappings are encoded in the texture file format so the file can 'alias' the missing 180' (for example) without the renderer knowing. All stored spans are linked off a row array and the row array does the aliasing (duplicating or dropping of unused rows etc.). This was the easiest way to get that capability without mucking up the drawing code.

The texture generator tool needs to be told which textures should be symmetric and the actual symmetry period (and I still need to do that bit) but it's easy enough.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

I've not been posting much on new stuff / progress recently but I've been busy on all fronts and shaking off a cold.

There is plenty in the works though both on this project and other Atari stuff. Several unfinished items will probably start to appear (and/or hidden extras revealed) around the same time and there's at least one other demo underway with some new stuff in it.

So I'm not gone - a bit less noise and more action maybe ;)

As for STray - there are some improvements on the way but I'll probably not post an update until some of those things are actually implemented.

In the meantime though, posts won't be so frequent.
User avatar
Desty
Atari God
Atari God
Posts: 1990
Joined: Thu Apr 01, 2004 2:36 pm
Location: 53 21N 6 18W

Re: DOOM on atari st

Post by Desty »

This is an amazing thread! It's been great to watch the limits of the humble ST pushed so far beyond what was previously known...
tá'n poc ar buile!
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: DOOM on atari st

Post by dml »

Thanks.

It is still in progress but won't be in a good state until sometime early next year - coding time permitting as always.
AnthonyJ
Captain Atari
Captain Atari
Posts: 165
Joined: Sat Jan 26, 2013 8:16 am

Re: DOOM on atari st

Post by AnthonyJ »

dml wrote:This is probably the *primary* reason Carmack had to drop raycasting after Wolf. The state hashing for floor/ceiling areas immediately cancels most of the benefits. I wouldn't be surprised if he tried the same thing shortly after Wolf3D and came to the same conclusion. By switching to BSP and generating visplanes per 'room' instead of per screen column, the problem goes away, but then so does the need for grid based map... and the next thing you know you have the basis for 'Doom'
Carmack has just done this interview: http://www.wired.com/gamelife/2013/12/j ... mack-doom/ - in which he said:
There were some intermediate steps that I did for a technology engine that I did for Origin at the time that added lighting and floors and ceilings—it was still tile-based, but the graphics side of things that people look at, the core things where it had lighting, where you could have flashing darkened areas, was no longer tile-based
(that intermediate game was ShadowCaster - don't think I saw that one at the time, dunno how I missed it!). Is this the missing link that you were guessing existed?

Return to “Games - Requests”