Quake 2 on Falcon030

All 680x0 related coding posts in this section please.

Moderators: Zorro 2, Moderator Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I'll try to answer some of that but it will need to be tomorrow - got stuff on this evening...
EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 926
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK

Re: Quake 2 on Falcon030

Post by EvilFranky »

VladR - AtariAge, that's where I recognise the username from.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Post by Eero Tamminen »

VladR wrote:
dml wrote:Only porting & optimizing - that's not exactly accurate. In fact such an approach would fail if tried.
Well, not in a literal sense. There's obviously lots and lots of refactoring, benchmarking, figuring out how the code works together and eventually trying to play to Falcon's strengths. I've read the thread from the beginning and your approach is sound - I'd call it Iterative Refactoring.
I think you need to read also BadMood thread:
http://atari-forum.com/viewtopic.php?f=68&t=24561

There was a m68k & DSP engine from 90's for viewing ID's Doom WAD files, which didn't use any of Doom (or Quake) code and with which Douglas was originally involved with. He picked up the work couple of years ago, bolted Doom 1 & 2 game logic code to that rendering engine and during past 2 years rewrote & optimized the heck out of it + wrote WAD convertion code to a modified format that this engine nowdays uses (+ CPU based soft synthetizer for MIDI music included into WADs).

More information on the BadMood engine is here:
http://devilsdoorbell.com/specialization/

There's quite a bit of similarity between Doom & Quake game structures. To do Quake rendering engine, Douglas really doesn't need Quake code (except as reference), he can use his own and knows how ID code works inside out. Right, Douglas? ;-)
VladR wrote: Different question: How do you exploit the DSP parallelism on Falcon ? I noticed you have previously done some benchmarking with regards to the bandwidth - e.g. whether it even makes sense to upload the data back and forth, considering the latency. I'd really enjoy if you wrote a bit more on your experience :cheers:
For performance work tooling is really important.

When you have the right tools, they point out where the bottlenecks are & how large they are. And when one is really competent, one can just keep optimizing/redesigning those bottlenecks away and see continuous performance improvement. Simple! (but very time-consuming)

Tools used for that are Hatari's CPU & DSP profilers (can profile also looping/polling, such as DSP waits):
http://hg.tuxfamily.org/mercurialroot/h ... #Profiling

There's also some scripting to produce profiles automatically and extra tools Douglas has written specifically for Doom/Quake. Above listed Doom (BadMood) thread has examples of this.
VladR
Atariator
Atariator
Posts: 18
Joined: Thu Mar 05, 2015 7:30 pm

Re: Quake 2 on Falcon030

Post by VladR »

EvilFranky wrote:VladR - AtariAge, that's where I recognise the username from.
That's the nick I have been using all my life across all forums, Mr. Evil. Not sure if your account is hacked, or this is really your 666th post, though I would not be surprised these days...

I am, however, pretty sure we do not need the AA nonsense here. This looks like a civilized forum, from what I had a chance to see, so far.

We should all revel in the fact that DML is a skilled and devoted coder willing to share the information and discoveries.

So, if possible, let's leave AA where it belongs :twisted:




EDIT : my iPad thought that by DML I meant XML...
Last edited by VladR on Fri Mar 27, 2015 12:02 pm, edited 1 time in total.
VladR
Atariator
Atariator
Posts: 18
Joined: Thu Mar 05, 2015 7:30 pm

Re: Quake 2 on Falcon030

Post by VladR »

Eero Tamminen wrote:
I think you need to read also BadMood thread:
http://atari-forum.com/viewtopic.php?f=68&t=24561

There was a m68k & DSP engine from 90's for viewing ID's Doom WAD files, which didn't use any of Doom (or Quake) code and with which Douglas was originally involved with. He picked up the work couple of years ago, bolted Doom 1 & 2 game logic code to that rendering engine and during past 2 years rewrote & optimized the heck out of it + wrote WAD convertion code to a modified format that this engine nowdays uses (+ CPU based soft synthetizer for MIDI music included into WADs).

More information on the BadMood engine is here:
http://devilsdoorbell.com/specialization/

There's quite a bit of similarity between Doom & Quake game structures. To do Quake rendering engine, Douglas really doesn't need Quake code (except as reference), he can use his own and knows how ID code works inside out. Right, Douglas? ;-)



For performance work tooling is really important.

When you have the right tools, they point out where the bottlenecks are & how large they are. And when one is really competent, one can just keep optimizing/redesigning those bottlenecks away and see continuous performance improvement. Simple! (but very time-consuming)

Tools used for that are Hatari's CPU & DSP profilers (can profile also looping/polling, such as DSP waits):
http://hg.tuxfamily.org/mercurialroot/h ... #Profiling

There's also some scripting to produce profiles automatically and extra tools Douglas has written specifically for Doom/Quake. Above listed Doom (BadMood) thread has examples of this.
I am quite surprised how much profiling Hatari does! All the cycles, latencies and everything. That's a huge help. Of course, having tailored profiling code is always preferred, but I presume XML already has all that from his bad mood project.

Speaking of which, thanks for the link, I will go and read through it now, to learn more about the Optimizations.

I am just slightly confused what you mean by similar code between Quake and Doom. From what I heard, other people mentioned that the two code bases are anything but similar?


And of course, let me applaud for pushing the envelope on Atari :cheers:
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Quake 2 on Falcon030

Post by calimero »

VladR wrote:
dml wrote: You must keep in mind however that we're talking about 1993 and this hadn't been done before in a game - ever. The only significant changes between 1993 and 1997 involved a transition from mono->colour lightmaps. This aside it appears to be much the same.
That might be an excuse for a small amateur indie team. Not for hardened veterans who had access to enormous finances after the success of Wolfenstein & Doom 1&2. You must realize they could have easily afforded to hire 50 people to work just on the editor/tools/radiosity solver, if they wanted.
in 1993. "small amateur indie team" was ID software. Check this video: https://www.youtube.com/watch?v=HpEBUV_g9vU
I would say that it will be real mess to throw 50 programmers on project that has so many new and never done before stuff.
ID in 1993 looks like "small amateur indie team" but doing ground shaking stuff :)

btw it is DML
and I found really good video about ID history: https://www.youtube.com/watch?v=7YreEwtV7D0
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
VladR
Atariator
Atariator
Posts: 18
Joined: Thu Mar 05, 2015 7:30 pm

Re: Quake 2 on Falcon030

Post by VladR »

calimero wrote:
VladR wrote:
dml wrote: You must keep in mind however that we're talking about 1993 and this hadn't been done before in a game - ever. The only significant changes between 1993 and 1997 involved a transition from mono->colour lightmaps. This aside it appears to be much the same.
That might be an excuse for a small amateur indie team. Not for hardened veterans who had access to enormous finances after the success of Wolfenstein & Doom 1&2. You must realize they could have easily afforded to hire 50 people to work just on the editor/tools/radiosity solver, if they wanted.
in 1993. "small amateur indie team" was ID software. Check this video: https://www.youtube.com/watch?v=HpEBUV_g9vU
True. They kept trying to keep that spirit for quite some time, but we all know how that pissing against the wind turned out...
calimero wrote:I would say that it will be real mess to throw 50 programmers on project that has so many new and never done before stuff.
ID in 1993 looks like "small amateur indie team" but doing ground shaking stuff :)
I did not say throw 50 people on whole project. Leave the engine work just on JC, yes. But throw 50 people on the toolset. Especially because it was ground-breaking stuff - e.g. research. Every time you are doing a research, there are few paths along the road. More people means you can explore those paths at the same time, as everyone is following a different path and when there are first results, a decision can be made on which path looks most promising. One guy will take 12 months to explore them all, while 4 guys will do it in ~4 months - that's all.

With more people, you can leave the UI and streamlining to other people, and just focus on improving core technology.

I guess, what I am trying to say, is that DML is not the first guy to figure out the issues with their toolset. I must have read the same, in last 2 decades, across tens of coding forums, about 100-150 times, so I kinda have a strong opinion about it - ESPECIALLY - considering they had all the money in the world.

Yes, the 50 people was an exageration, but history has shown time and again that more people on the toolset is always a good idea (and it was a common knowledge in 1993 already - just read few gamasutra 'obituaries').
calimero wrote:btw it is DML
Oops, I was typing on iPad, hence only the single quote of those posts, and few things got auto-corrected (e.g. dml -> XML). My fault for not double-checking, thanks for pointing it out. Will go edit it right away.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

VladR wrote:[Well, not in a literal sense. There's obviously lots and lots of refactoring, benchmarking, figuring out how the code works together and eventually trying to play to Falcon's strengths. I've read the thread from the beginning and your approach is sound - I'd call it Iterative Refactoring.
IR is definitely involved in areas such as BSP traversal and getting lightmaps lined up properly. It's quite difficult to get those things to match up by doing it from scratch because they rely on some details and assumptions which aren't always clear. There are some shared assumptions between surfaces (in the cache) and lightmaps for example which depend on a specific estimation. Get that wrong and you get strange texturing & lighting bugs.

So I used a lot of IR to get from a working version of those areas in C to an implementation that still works and is Falcon-friendly.

Of course this mainly applies to rendering from .BSP files - it's not an issue from custom data where the assumptions are flexible and can be set by me.

The rasterization stage borrows principles from that era (sort & scan, and the planar math approach to texturing) but the implementation was custom made without deriving from a C version, based on earlier work of mine, and with different capabilities (e.g. reserving the possibility of transparency without zbuffering, and related stuff).
VladR wrote: I'm glad you're not totally hardcore, as some other Atarians, that want to do everything on the HW itself. While fun, it's woefully not productive, so I applaud your decision for the middle ground.
I think the HW-centric view is more of an Amiga thing, since they have more varied hardware to be bashing registers with and getting leverage that way. A lot of Atari stuff is equally low-level and devious but perhaps a bit more programming-centric.

I have done a bit of that from time to time (on ST) but I'm mostly interested in 3D - and register bashing on the Falcon is mostly limited to the DSP for me. It's mostly code in the end. There's a limit to the 3D leverage you can obtain via hardware on these machines.

But in the end, *anything* done on these old boxes gets my vote - low or high level. I enjoy it all. Main thing is to be creative and have some fun :-)
VladR wrote: Different question: How do you exploit the DSP parallelism on Falcon ? I noticed you have previously done some benchmarking with regards to the bandwidth - e.g. whether it even makes sense to upload the data back and forth, considering the latency. I'd really enjoy if you wrote a bit more on your experience :cheers:
I discus this a bit somewhere in the thread (can't find it right now - will edit with a link when I get time to dig for it) which covers the two main kinds of parallelism used - fine grained and coarse grained - and the process I (mostly) use for estimating what to use where.

However much comes from other inputs:

- experience with the machine's profile - time taken typically to move a piece of data from A to B, execute an instruction etc. everyone gains that after a while on any given machine and its pretty essential
- knowing what resources are available and when (obvious really! but code needs to be designed with this in mind always, to allow 'fitting' of blocks later). I find it's helpful to identify blocks (partly) by the resources they will steal and in what order to help with overlapping later on.
- having a picture of which algorithms are bottlenecks and why (profiling, knowledge of processes taking place)
- mapping algorithms or stages which need random-access to main storage/ram, vs algorithms which perform heavy computation on lightweight data. (also algorithms which try to do both at once, which suggests a tightly coupled form of parallelism - as in the case for BSP traversal)
- profiling itself - mainly for confirmation, occasionally for discovery. it also helps with experiments but its primarly a confirmation tool.

I used profiling a lot more on BadMooD because there was a huge discovery problem - the 3D engine was half-done in 1997, dropped and resurrected in 2012, partly rewritten, then glued onto Id's game code. Which is a special kind of mess. It required a lot of profiling-oriented discovery to make the best of it after all of that.

For the Q2 derivative there has been much less need for discovery because it was designed reasonably well before starting to write any code, and kept relatively simple, with a decent track of the costs with confirmation only (50% of which is done on real HW, 50% running in Hatari emulator).

For example I didn't bother to bring my sampling profiler over from BadMooD because there hasn't been so much of a need. I may do at some point but so far I've been doing fine without it. Hatari profiler has been enough to confirm that optimizations are beneficial or not.

I'd say the main 'bonus' the Hatari profiler provides for Q2 is occasional confirmation that parallelism itself is working as intended. It's a notoriously difficult thing to test, and often involves deliberate foils and specific expectations - this often works but not always, not in all cases. Hatari helps look into those cases without so much pain.

(That's probably less detail than you asked for but I'm short of time here ATM and will link some of the profiling bits later on).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Just a quick update this time:

- radiosity quality has been increased with multisampling at 2 additional stages (area lights, face inter-visibility)
- sunlight now has scattering
- lightmaps now have soft edges

Preprocessing for my test map increased from ~2 minutes to ~14 minutes with reasonably high settings for multisampling, which is fine for me. For a dense map it will probably be a long wait but it's not an issue for what I'm planning in the next vid/demo thingy.

Overall its looking decent but I didn't have time to fix the sun placed outside of the skybox. I think I can see how to do it now though so it will probably happen next. Might be Sunday or an evening next week before I get to look again.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Post by Eero Tamminen »

VladR wrote:
Eero Tamminen wrote:Tools used for that are Hatari's CPU & DSP profilers (can profile also looping/polling, such as DSP waits):
http://hg.tuxfamily.org/mercurialroot/h ... #Profiling
Otherwise the profiling code works well, but Valgrind cachegrind format export isn't very polished:
http://hg.tuxfamily.org/mercurialroot/h ... egrind.png

(You'll notice this when navigating profiler disassembly output in the Kcachegrind Qt GUI.)
VladR wrote:
Eero Tamminen wrote: There's also some scripting to produce profiles automatically and extra tools Douglas has written specifically for Doom/Quake. Above listed Doom (BadMood) thread has examples of this.
I am quite surprised how much profiling Hatari does! All the cycles, latencies and everything. That's a huge help.
Few years ago Atari had so little tooling for profiling that it was embarrasing. About only thing was gprof, which is pretty much a joke compared to tools available nowdays (you would need to recompile everything with instrumentation, instrumentation can distort small function costs a lot and it still gives you only information on the program itself, not on the system side).

At my previous work I had been spoiled with good Linux tools (Valgrind callgrind/cachegrind, Oprofile & LTTng open source tools, and some really nice proprietary in-house tools). As I had a lot of free time just before Douglas started working again on BadMood, I decided to fix this gap and wrote profilers that give similar information on Atari. :-)

If I would need to do similar stuff for some other platform, I would probably target Valgrind cachegrind format directly for callgraphs. For disassembly & debugging Gdb server might be nice (it has many GUIs and can nowadays be fully scripted with Python), but for Hatari it wasn't good fit because Gdb doesn't support Falcon DSP and I don't know how easy it would be to add profiling data to its disassembly. IMHO it's important to provide same UI for both CPU & DSP side debugging.

VladR wrote: Of course, having tailored profiling code is always preferred,
BadMood has Hatari debugger/profiler scripting to automatically dump profiles & callgraphs, on slowest frame during game play. I.e. you build new version, invoke a script and after it runs through a game play recording (= few minutes), you have automatically profiles of the largest current bottlenecks both on CPU & DSP sides.

Same works also with manual gameplay, but you need really fast machine for things to be playable while Hatari emulates Falcon (especially DSP) and profiles (= dissasembly of all executed addresses) are occasionally dumped to disk.
VladR wrote: but I presume XML already has all that from his bad mood project.
Yep. Both for memory usage and for profiling things on the real device. Hatari doesn't emulate data cache (only instruction cache), so real Falcon is clearly faster with data cache friendly code. And Hatari's (WinUAE derived) floating point emulation is ~2x faster than instructions on real device.
Last edited by Eero Tamminen on Fri Mar 27, 2015 9:06 pm, edited 1 time in total.
EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 926
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK

Re: Quake 2 on Falcon030

Post by EvilFranky »

VladR wrote:
EvilFranky wrote:VladR - AtariAge, that's where I recognise the username from.
That's the nick I have been using all my life across all forums, Mr. Evil. Not sure if your account is hacked, or this is really your 666th post, though I would not be surprised these days...

I am, however, pretty sure we do not need the AA nonsense here. This looks like a civilized forum, from what I had a chance to see, so far.

We should all revel in the fact that DML is a skilled and devoted coder willing to share the information and discoveries.

So, if possible, let's leave AA where it belongs :twisted:




EDIT : my iPad thought that by DML I meant XML...
Not hacked just an inappropriate coincidence! :)

Good to see your acknowledgement of the somewhat unforgiving hostility that presents itself on AA once in a while, it's nothing like that in here.

But you didn't really provide much of an introduction before posting what a few probably thought was a slightly aggressive cross examination of Doug's work in progress! This coupled with the fact the only other forum I've seen you on was AA...triggered a few alarm bells, and we are very protective of our Falcon developers on here (Doug being a favourite due to his recent Doom conversion) :lol:

Anyway welcome aboard, looking forward to reading your contributions as you clearly have a technical background which can only add to the interest on threads like these :cheers:
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I had a feeling that the first time I played with radiosity based lighting was actually on the Falcon itself, in 1994!

Have found the original project mixed up with other stuff, so now there is a vid:

https://www.youtube.com/watch?v=3oco6We6RcY


It's pretty ugly (and slow) but it was just me playing around back then. Probably after seeing Quake 1 and thinking 'that crazy Carmack has been busy again!' and wanting to experiment with something similar.

Anyway, I'll be back on topic soon :)
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2639
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia

Re: Quake 2 on Falcon030

Post by calimero »

^
hey Doug, I remember this demo! I have it somewhere on CD...
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

So I've been intermittently working on 3 new maps for the next vid. A very different look from the stuff shown previously.

One map is pretty much done, minus a few details. The second is about half-way (mostly built in fact, but a bit too heavy for 16mhz and needs trimmed). The third I haven't started yet, but will probably be small anyway.

The rad tool is now working correctly except for the sky/sun shadowing thing, which will be fixed next. I had a couple of extra ideas to improve the look of these maps via lighting so there might be a few more changes soon.

I found the Q2 map editor to be quite strange and initially a struggle but after a couple of hours it becomes a breeze.

The stress factor from the 2nd map is also a good incentive for another round of optimization and precision work!
VladR
Atariator
Atariator
Posts: 18
Joined: Thu Mar 05, 2015 7:30 pm

Re: Quake 2 on Falcon030

Post by VladR »

dml wrote: IR is definitely involved in areas such as BSP traversal and getting lightmaps lined up properly. It's quite difficult to get those things to match up by doing it from scratch because they rely on some details and assumptions which aren't always clear. There are some shared assumptions between surfaces (in the cache) and lightmaps for example which depend on a specific estimation. Get that wrong and you get strange texturing & lighting bugs.
Yes, that's the single biggest problem when working with someoneelse's codebase - takes enormous amount of effort, issues and pain to figure out all of them, as they are usually undocumented, and merely 'assumed'. Of course, if there was a way to document the initial thought process, we'd all come up with those assumptions, but those initial R&D stages are rarely documented in the source code.
dml wrote: The rasterization stage borrows principles from that era (sort & scan, and the planar math approach to texturing) but the implementation was custom made without deriving from a C version, based on earlier work of mine, and with different capabilities (e.g. reserving the possibility of transparency without zbuffering, and related stuff).
Yeah, one can hardly expect good results from PC architecture rasterizing on the Falcon. I suppose that's where your prior experience with badmood helped a lot, as you had your first (and probably also second) go at the 3D rasterizing on Falcon already and knew what worked and what didn't on the target architecture.
dml wrote: I have done a bit of that from time to time (on ST) but I'm mostly interested in 3D - and register bashing on the Falcon is mostly limited to the DSP for me. It's mostly code in the end. There's a limit to the 3D leverage you can obtain via hardware on these machines.
There's quite a bit more you can achieve on the jaguar for 3D stuff, via register bashing. Alas, not much texturing - though few months ago when I was playing with jag, I got the Blitter to do horizontal and vertical texture spans via Blitter. So, while not intended via HW design, one can eventually twist it. I still haven't touched the DSP yet, which is why I am very curious to your experience with DSP on Falcon.
dml wrote: However much comes from other inputs:

- experience with the machine's profile - time taken typically to move a piece of data from A to B, execute an instruction etc. everyone gains that after a while on any given machine and its pretty essential
- knowing what resources are available and when (obvious really! but code needs to be designed with this in mind always, to allow 'fitting' of blocks later). I find it's helpful to identify blocks (partly) by the resources they will steal and in what order to help with overlapping later on.
- having a picture of which algorithms are bottlenecks and why (profiling, knowledge of processes taking place)
- mapping algorithms or stages which need random-access to main storage/ram, vs algorithms which perform heavy computation on lightweight data. (also algorithms which try to do both at once, which suggests a tightly coupled form of parallelism - as in the case for BSP traversal)
- profiling itself - mainly for confirmation, occasionally for discovery. it also helps with experiments but its primarly a confirmation tool.
I am aware of those stages (from other projects), but I was secretly hoping you found some silver bullet to avoid the amount of work involved :D
dml wrote: For the Q2 derivative there has been much less need for discovery because it was designed reasonably well before starting to write any code, and kept relatively simple, with a decent track of the costs with confirmation only (50% of which is done on real HW, 50% running in Hatari emulator).
That's an incredible help that you can get away with 50% of the builds on the emulator. Not happening with the jaguar emulators.
dml wrote: For example I didn't bother to bring my sampling profiler over from BadMooD because there hasn't been so much of a need. I may do at some point but so far I've been doing fine without it. Hatari profiler has been enough to confirm that optimizations are beneficial or not.
I discovered the Hatari profiling features only yesterday, but I am totally floored by it. It beats the features of enterprise profilers worth thousands of dollars, I worked with. Brutally amazing !
dml wrote: I'd say the main 'bonus' the Hatari profiler provides for Q2 is occasional confirmation that parallelism itself is working as intended. It's a notoriously difficult thing to test, and often involves deliberate foils and specific expectations - this often works but not always, not in all cases. Hatari helps look into those cases without so much pain.
Wait, Hatari can give you details on the DSP vs CPU parallelism without your own profiling code ? That feature alone makes me want to switch from Jag to Falcon...
dml wrote: (That's probably less detail than you asked for but I'm short of time here ATM and will link some of the profiling bits later on).
We had to start somewhere and I gracefully appreciate the patience and your willingness to share the technical information. You are the proof, unicorns are not a myth in Atari Land :angel:
VladR
Atariator
Atariator
Posts: 18
Joined: Thu Mar 05, 2015 7:30 pm

Re: Quake 2 on Falcon030

Post by VladR »

dml wrote:Just a quick update this time:

- radiosity quality has been increased with multisampling at 2 additional stages (area lights, face inter-visibility)
- sunlight now has scattering
- lightmaps now have soft edges

Preprocessing for my test map increased from ~2 minutes to ~14 minutes with reasonably high settings for multisampling, which is fine for me. For a dense map it will probably be a long wait but it's not an issue for what I'm planning in the next vid/demo thingy.
I don't understand why would preprocessing time rise by a factor of 7:1 due to multisampling. That's a little amount of computations compared to radiosity. When I was applying a 3x3 kernel blur on Jaguar (I have a flag that certain textures can be blurred/filtered upon loading at runtime), it was really fast even in high-level C code, purely on CPU. Perhaps the sunlight has something to do with the time increase ?
dml wrote:Overall its looking decent but I didn't have time to fix the sun placed outside of the skybox. I think I can see how to do it now though so it will probably happen next. Might be Sunday or an evening next week before I get to look again.
What exactly are you doing with the sunlight as a separate source ? Are you perhaps integrating that to the radiosity solver and distributing the sun's energy across all patches ? Visually, that could provide a separate coloring pass over the whole map - for example some maps could have the evening theme and have the red hue on those walls that would directly face the sun (with perhaps a bit of color bleeding on the nearby walls).

You could do the same with darker maps and set the color to dark blue (as in night). Should mix well all other light sources.

Not sure if I am making sense now.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

VladR wrote:Yes, that's the single biggest problem when working with someoneelse's codebase - takes enormous amount of effort, issues and pain to figure out all of them, as they are usually undocumented, and merely 'assumed'. Of course, if there was a way to document the initial thought process, we'd all come up with those assumptions, but those initial R&D stages are rarely documented in the source code.
Yep - working with someone else's code is not always fun. In fact its usually terrible. But JC has the advantage of being a genius and many of his 'oversights' and omissions turn out to not matter all that much when the original target requirements are considered. I found many optimizations which could be applied to the scene gather process and which were useful on the Falcon - but to say they would have been meaningful on a Pentium 133 target - maybe, but it's hard to say.

Even the radiosity problem I mentioned (approximating reflectivity from whole texture vs area actually used by surface). Perhaps it was a deliberate compromise, to avoid discontinuities between BSP-chopped surfaces on the same plane? I don't know, but there is a reasonable chance he thought about this (or even tested it) and decided the other way caused fewer visible errors even if it was less correct. Or maybe it is a real oversight. Kind of hard to ask him now.

I was in touch with him briefly but sadly not about anything that interesting :)
VladR wrote: Yeah, one can hardly expect good results from PC architecture rasterizing on the Falcon. I suppose that's where your prior experience with badmood helped a lot, as you had your first (and probably also second) go at the 3D rasterizing on Falcon already and knew what worked and what didn't on the target architecture.
Yes - by the 2nd or 3rd time you do something, it begins to get close to the right solution :)

In fact I had stopped Falcon development for a long time - last touched it around 1997-ish, left and started playing with it again in 2012. Reviving BadMooD was a chance to recover from 'rusty' to being able to code on it properly again.
dml wrote: There's quite a bit more you can achieve on the jaguar for 3D stuff, via register bashing. Alas, not much texturing - though few months ago when I was playing with jag, I got the Blitter to do horizontal and vertical texture spans via Blitter. So, while not intended via HW design, one can eventually twist it. I still haven't touched the DSP yet, which is why I am very curious to your experience with DSP on Falcon.
That's true. The Jag is quite unexplored still. It's tough to crack initially but spend enough time on it and those problems gradually go away. It is a pity the original dev tools were not so great but I think a lot of that stuff is getting fixed now by the homebrew community?

I was last near it in 1997, worked on one project but didn't complete it due to business related woes.
VladR wrote: I am aware of those stages (from other projects), but I was secretly hoping you found some silver bullet to avoid the amount of work involved :D
Ha! No. I'm afraid I don't have any such magic bullet. Decent planning is the only way to reduce the effort involved in the rest of it.

90% thinking it through, 10% typing!
VladR wrote: That's an incredible help that you can get away with 50% of the builds on the emulator. Not happening with the jaguar emulators.
It cuts typical turnaround time to about 1/3rd for excecutables, and a lot more for big data updates (well, it's free since there is host folder sharing with the emulated machine). That's very helpful!

Having said that, Falcon emulation is not exact. It is probably far superior to Jag emulation but it's a very hard machine to emulate, much more so than ST. Some things you just don't try to do, or don't rely on to keep safe.

I have to regularly check executables on a real machine otherwise the dreaded black screen will surface and it's then a trawl back through source control to find the cause. :)

I played with Virtual Jaguar a few times, to run my old COF files and got about half of them working properly, a few *mostly* working and a few wouldn't run at all. I sent them off to the emulator devs to see if it would help them debug it, but I have no idea if they did actually do that :)
VladR wrote: I discovered the Hatari profiling features only yesterday, but I am totally floored by it. It beats the features of enterprise profilers worth thousands of dollars, I worked with. Brutally amazing !
Most commercial profilers are just s#*t.

I think the best one (other than Hatari!) I played with was SN Systems on the PS2. It was mighty expensive and you needed one of those giant Linux based devkit boxes from Sony. So hombrew = no chance! But the guys who built the profiler clearly also worked on games, because it did what you wanted...
dml wrote: Wait, Hatari can give you details on the DSP vs CPU parallelism without your own profiling code ? That feature alone makes me want to switch from Jag to Falcon...
I have to say, Eero has pretty much added to Hatari debugger/profiler everything I asked for. That's about 5 things more than I probably deserved to get added :)

TBH it was a great profiler to begin with but some less common things were laborious to do without some changes and he always magically delivered 'tweaks'. (Thanks Eero! :D )

Parallelism 'visibility' isn't a direct function of Hatari - I was estimating this by looking at blockages on one or other side, where one or other processor 'spins' for access. These spins show as spikes in the profiled address counters.

Eero provided some extra functionality which enumerates these sites automatically using heuristics, so they aren't so easily missed when scanning the profiler output.

The profiling of CPU+DSP simultaneously is sometimes very useful. I will normally do it separately but having that capability is important when dealing with some weird cases.

Re: Switching from Jag - it's worth considering if you're just doing it for the interest factor. I'm sure the Falcon community will be happy to have new stuff surfacing :)

The Jag probably has at *least* as much to uncover and quite a few of the seasoned ST people are doing things on it now - so there's plenty of technical reasons to stick with it. But if you want a good community to bounce ideas around (instead of bouncing each other around), it's probably here. :D
VladR wrote: We had to start somewhere and I gracefully appreciate the patience and your willingness to share the technical information. You are the proof, unicorns are not a myth in Atari Land :angel:
You're welcome.

I have to say that around here are many excellent people who know their stuff - in one area or another. Quite a few of them don't post much or just lurk, but they are around - and can be willing to help/contribute if needed. Some are from a deeper layer of the scene and some not so easy to summon, but it's possible if you make the right noises ;-)

You'll also find quite a few other projects in progress on AF which are worth checking out, including everything else done by those authors, some of whom straddle the various scene layers too. ;)

That's enough from me - mammoth posts again. Need to do stuff...
BFN
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

(I'll keep this one brief)
VladR wrote:I don't understand why would preprocessing time rise by a factor of 7:1 due to multisampling. That's a little amount of computations compared to radiosity. When I was applying a 3x3 kernel blur on Jaguar (I have a flag that certain textures can be blurred/filtered upon loading at runtime), it was really fast even in high-level C code, purely on CPU. Perhaps the sunlight has something to do with the time increase ?
Yes the extra load is entirely down to the addition of a sun with scattering. The sun is sampled 256 times per evaluation in a gaussian cloud (well, samplecount passed via commandline) and accounts for most of the performance change. In relative terms its a lot, but in terms of visual gain for me for paying an extra 10 minutes it's a net gain :)

Some of the cost is also due to the sun being 'not physically in the map' and therefore not testable via PVS, cluster-to-cluster. This doesn't make a big difference unless the map is has a close topology but it still counts.
dml wrote:What exactly are you doing with the sunlight as a separate source ? Are you perhaps integrating that to the radiosity solver and distributing the sun's energy across all patches ? Visually, that could provide a separate coloring pass over the whole map - for example some maps could have the evening theme and have the red hue on those walls that would directly face the sun (with perhaps a bit of color bleeding on the nearby walls).
You could do the same with darker maps and set the color to dark blue (as in night). Should mix well all other light sources.
Yes exactly - this is what I've been trying to get with these changes. The 'sun' is just crafted from a pointlight with infinite distance and a jittered position, and it's own colour etc. The skybox faces also emit energy and colour the scenery which is not in direct sunlight - so you get warm patches of sun and cool shadows, and some bleeding between cases.

It seems to be working now but the primary textures are too noisy for the Falcon's 16bit colour mode so I'll be doing a bit more work on the maps to make it easier on the eyes.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3999
Joined: Sun Jul 31, 2011 1:11 pm

Re: Quake 2 on Falcon030

Post by Eero Tamminen »

VladR wrote:I discovered the Hatari profiling features only yesterday, but I am totally floored by it. It beats the features of enterprise profilers worth thousands of dollars, I worked with. Brutally amazing !
Implementing profiler in an emulator for these kind of old/small systems is much simpler than on real device, because things done in emulator aren't visible to the emulated system and host systems are nowadays so much more powerful. So profiler doesn't need to be very clever, it can just brute-force things:

When profiling is started (= emulation continued with profiling enabled), profiler just allocates device memory sized array (for each memory area) for keeping track of number of executed instructions etc, for each memory address. This information is taken from the CPU/DSP core emulation and updated after every instruction. When profiling is stopped (= debugger/script is re-entered), that data can be investigated & saved. Sorting the collected data array(s) based on different criteria is trivial.

Hatari had already CPU & DSP disassemblers, so these needed only to be mofied to return the dissembled line as string (to profiler). With these the profiler could output & save disassembly with profile information (profiler calls disassemblers only for memory addresses that were executed during profiling). This already is quite useful both for debugging and performance analysis of more complex code (especially code you're unfamiliar with). Even with just instruction counts it would tell how many times functions and (e.g. IO wait) loops get called, what code isn't called at all, or gets unexpected called (e.g. interrupt handlers)...

Adding something like that to Jaguar emulator should be pretty straightforward, if it already includes disassemblers/debuggers for the relevant chips in the machine.

Collecting data needed for callgraphs is more complicated, so I'm not going to discuss it here except that mention that it tracks calls through couple of different means:
- checking whether current address matches list of (symbol) addresses you've loaded to the debugger
- instruction types

Things like BadMood's automated "worst frame profiling" rely on Hatari debugger scripting & breakpoint utilities (breakpoint for profile dumping gets triggered on a frame if that took more cycles or instructions than the wost frame so far, and new dump overwrites the old dump).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Got basic code overlays working now so it should be possible to optimize all stages of the DSP code to use fast memory, and later to release much of the space used by code (since an arbitrary amount of it can now be kept on the CPU side until needed).
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

2 of 4 stages now using code overlays on the DSP, so the other 2 stages I think shouldn't pose a problem.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

Last night I had all 4 stages executing from internal DSP ram using overlays, and releasing the initial DSP memory used to load the overlays back to the engine for buffers etc. I didn't have time to move things around and optimize with that, but it can be done next.

Eero has given me a patch for the Hatari profiler which will hopefully allow the profiler to be 'blanked' by the CPU side for all but one overlay at a time, so it may be possible to profile code while DSP overlays are being used, without ending up with a mess. Not quite in a position to try this yet but will do so once the overlay changes are settled down.
User avatar
Mindthreat
Captain Atari
Captain Atari
Posts: 279
Joined: Tue Dec 16, 2014 4:39 am

Re: Quake 2 on Falcon030

Post by Mindthreat »

w00t! :D

:cheers:
Atari-related YouTube Videos Here: - https://www.youtube.com/channel/UCh7vFY ... VqA/videos
Atari ramblings on Twitter Here: https://twitter.com/mindthreat
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

:)

Thought of another experiment to try later - texturing direct from the lightmaps (no base texture) for certain surfaces. This won't make them draw faster, but it removes the need for a surface in the cache, so they can be used as stand-ins until the surface becomes available. This allows surface building to be deferred/spread out more over time, and also overlapped with some DSP step during the next frame.

Instead of a base texture, the temporary face will use a single average colour from that texture to modulate the lightmap (basically 256x lightmap palettes).

This policy, combined with some other similar things should help smooth out surface cache building over the narrow Falcon bus while the camera is in motion.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3991
Joined: Sat Jun 30, 2012 9:33 am

Re: Quake 2 on Falcon030

Post by dml »

I have rewritten innermost areas of the span rendering on DSP side which allowed the CPU side to be optimized more properly, so there is another measurable speedup for more complex scenery, especially when the camera is not moving at speed. It's still not a 'playable' framerate in many scenarios but it is getting much closer to that.

I think it is already fast enough to drive maps/scenery which can get crafted for the Falcon specifically (e.g. the kind of thing which was done for console games using IdTech 1/2 derivatives). It does struggle with the PC content with full texturing and that's likely to remain the case...

Some of the more extreme optimizations have not been made yet - e.g. using flatshading or lightmap-only texturing for certain types of faces under different circumstances (griazing angles, shards, distant faces, non-ready surfaces etc. etc. ). These could be enough to tip the balance later, for medium-dense content. I won't do that until the fully-textured part has been mostly exhausted.

I have most of what I wanted - not quite everything - for making a new vid with visible changes. but still a bit left to do.

Return to “680x0”