Quake 2 on Falcon030

Scarlettkitten · Post by **Scarlettkitten** » Thu Apr 02, 2015 2:27 pm

Brill can't wait to see, keep it up Doug

dml · Post by **dml** » Thu Apr 02, 2015 4:01 pm

This one's been a bit of a longish, boring wait to see changes but shouldn't be much longer. I do want to get transparency working again using the latest version and then will start on a vid.

I was hoping to get brush models (doors and stuff) and/or game objects going but things aren't moving along fast enough to include either of those so will leave it for the next round...

VladR · Post by **VladR** » Mon Apr 06, 2015 6:17 pm

Sorry for the late reply ! I had to wait till I could get a proper amount of time to respond given the amount of info your provided

Eero Tamminen wrote: Tools used for that are Hatari's CPU & DSP profilers (can profile also looping/polling, such as DSP waits):
http://hg.tuxfamily.org/mercurialroot/h ... #Profiling

I noticed there are stats for which addresses got touched most and which code (with detailed percentage, cycle count and cache misses). Nice !

Eero Tamminen wrote: At my previous work I had been spoiled with good Linux tools (Valgrind callgrind/cachegrind, Oprofile & LTTng open source tools, and some really nice proprietary in-house tools). As I had a lot of free time just before Douglas started working again on BadMood, I decided to fix this gap and wrote profilers that give similar information on Atari.

I actually don't know what to say here. Many of us, find it easy to spend the effort for something visible - be it a demo or game or something, but spending the effort on the toolchain is extremely appreciated, as rarely people find the toolchain to be 'exciting' enough.

Eero Tamminen wrote: If I would need to do similar stuff for some other platform, I would probably target Valgrind cachegrind format directly for callgraphs. For disassembly & debugging Gdb server might be nice (it has many GUIs and can nowadays be fully scripted with Python), but for Hatari it wasn't good fit because Gdb doesn't support Falcon DSP and I don't know how easy it would be to add profiling data to its disassembly. IMHO it's important to provide same UI for both CPU & DSP side debugging.

Well, right now, on the other platform (Jag), I am using Windows dev env. Which compiler toolchain are you guys using for C/Asm ? gcc ? I guess I would have to make the switch to linux, correct ?

Eero Tamminen wrote: BadMood has Hatari debugger/profiler scripting to automatically dump profiles & callgraphs, on slowest frame during game play. I.e. you build new version, invoke a script and after it runs through a game play recording (= few minutes), you have automatically profiles of the largest current bottlenecks both on CPU & DSP sides.

That sounds exactly like the set-up I have been using for profiling all my life

Eero Tamminen wrote: Yep. Both for memory usage and for profiling things on the real device. Hatari doesn't emulate data cache (only instruction cache), so real Falcon is clearly faster with data cache friendly code. And Hatari's (WinUAE derived) floating point emulation is ~2x faster than instructions on real device.

How much data cache does Falcon have, again ? As for the FP-emu, as long as there is a reliable and known coefficient (for the given machine), there is no problem with the difference.

shoggoth · Post by **shoggoth** » Mon Apr 06, 2015 7:18 pm

VladR wrote:How much data cache does Falcon have, again ? As for the FP-emu, as long as there is a reliable and known coefficient (for the given machine), there is no problem with the difference.

http://en.wikipedia.org/wiki/Motorola_68030

VladR · Post by **VladR** » Mon Apr 06, 2015 9:31 pm

dml wrote: Yes the extra load is entirely down to the addition of a sun with scattering. The sun is sampled 256 times per evaluation in a gaussian cloud (well, samplecount passed via commandline) and accounts for most of the performance change. In relative terms its a lot, but in terms of visual gain for me for paying an extra 10 minutes it's a net gain

What PC are you using to generate the lightmaps ? Does it make a big visual impact if you reduce the samples to 32 ?

dml wrote:Yes exactly - this is what I've been trying to get with these changes. The 'sun' is just crafted from a pointlight with infinite distance and a jittered position, and it's own colour etc. The skybox faces also emit energy and colour the scenery which is not in direct sunlight - so you get warm patches of sun and cool shadows, and some bleeding between cases.

It seems to be working now but the primary textures are too noisy for the Falcon's 16bit colour mode so I'll be doing a bit more work on the maps to make it easier on the eyes.

Yes, the contrast of 'warm patches' vs 'shadows' is what I had in mind. That should make the maps look much better!

What is wrong with the 16-bit color mode ? Not enough colors for smooth gradients ?

Eero Tamminen · Post by **Eero Tamminen** » Mon Apr 06, 2015 10:33 pm

VladR wrote:
Eero Tamminen wrote: If I would need to do similar stuff for some other platform, I would probably target Valgrind cachegrind format directly for callgraphs. For disassembly & debugging Gdb server might be nice (it has many GUIs and can nowadays be fully scripted with Python), but for Hatari it wasn't good fit because Gdb doesn't support Falcon DSP and I don't know how easy it would be to add profiling data to its disassembly. IMHO it's important to provide same UI for both CPU & DSP side debugging.
Well, right now, on the other platform (Jag), I am using Windows dev env. Which compiler toolchain are you guys using for C/Asm ? gcc ? I guess I would have to make the switch to linux, correct ?

Douglas, I guess you're using same setup for Quake code as for BadMood?

For building BadMood C code, I think Douglas was using Vincent's GCC 4.x Windows cross-compiler for Atari:
http://vincent.riviere.free.fr/soft/m68k-atari-mint/

(I was using native Atari GCC 2.x compiler, running in Aranym emulator, for building BadMood binary.)

For m68k assembly code, Douglas used GCC for inline assembly, and Vasm assembler for standalone m68k code. I think Douglas builds Vasm for Windows from sources:
http://sun.hasenbraten.de/vasm/

Symbols for the profiler are extracted with Atari "nm" tool from the "a.out" format BadMood Atari binary, with some filtering.

For DSP code, asm56000 compiler + cldlod (Motorola/Atari) tools are used to produce LOD files. Tool for converting LOD to binary file is included into BadMood sources (attached, original done by Miro with some mods by me).

Profiling BadMood needs (in addition to Doom WAD, BadMood binary & symbols) just Hatari, its python scripts for processing the profile data and attached shell script to run Hatari with suitable options and set up chained breakpoints to collect the information. Shell script requires Bash, I assume it could work also on Windows with Cygwin.

For viewing GraphViz format callgraphs, I use this:
https://github.com/jrfonseca/xdot.py

VladR · Post by **VladR** » Tue Apr 07, 2015 2:17 pm

[Slowly catching up on the thread, sorry for the delay]

Eero Tamminen wrote:Implementing profiler in an emulator for these kind of old/small systems is much simpler than on real device, because things done in emulator aren't visible to the emulated system and host systems are nowadays so much more powerful. So profiler doesn't need to be very clever, it can just brute-force things:

I like the fact that you are trying to make it look like it was not a big effort at all. Yeah, right

But yes, having several orders of magnitude more power at least makes the coding effort much simpler, which in the end results in more features in less time.

What language did you code it in ? C / C++ ? No idea what the hatari's high-level functionality (except for an actual emulation, which must be in ASM, of course - but I'd reckon the higher level processing loop does not have to be in ASM) has been coded in...

Eero Tamminen wrote:When profiling is started (= emulation continued with profiling enabled), profiler just allocates device memory sized array (for each memory area) for keeping track of number of executed instructions etc, for each memory address. This information is taken from the CPU/DSP core emulation and updated after every instruction. When profiling is stopped (= debugger/script is re-entered), that data can be investigated & saved. Sorting the collected data array(s) based on different criteria is trivial.

Looks like you either have some sort of callback chaining in place or just called your function directly in the Hatari source ?

Eero Tamminen wrote:Hatari had already CPU & DSP disassemblers, so these needed only to be mofied to return the dissembled line as string (to profiler). With these the profiler could output & save disassembly with profile information (profiler calls disassemblers only for memory addresses that were executed during profiling). This already is quite useful both for debugging and performance analysis of more complex code (especially code you're unfamiliar with). Even with just instruction counts it would tell how many times functions and (e.g. IO wait) loops get called, what code isn't called at all, or gets unexpected called (e.g. interrupt handlers)...

Yeah, that one is an easy fix, once you are in the codebase. So, before you enhanced the profiler, did it at least provide the instruction count, or did you also add that yourself ?

Eero Tamminen wrote:Adding something like that to Jaguar emulator should be pretty straightforward, if it already includes disassemblers/debuggers for the relevant chips in the machine.

I'm not sure I would even go that route, myself. Maybe, if there was a source code in C or something, but I'm pretty sure I'd rather (assuming I'd even do that in the first place) do the profiling analysis on the PC in something like C++ / C#. Example - for a jag - I wrote a translator from Visual Basic (the modern .NET version, not the old Atari Basic version) into C, which then gets pushed into the regular compiler toolchain, resulting in binary executable on jag.

Writing a basic external profiler in something like .NET seems easy enough - I suppose most of the work would be on the dissassembler side of things - but if I could write a disassembler on 8-bit Atari in Atmas, it should be a piece of cake in higher-level language. But right now, I'm not at the point of maximizing the performance throughput on jag yet - I'm still in the early discovery phases...

But your post definitely made me re-realize the importance of external profilers. So, when the time comes, I will go write the external tool, as for a small upfront time investment, it will provide an invaluable information on the cycle count / callgraph, automatically upon each build.

I must say, I really appreciate the brainstorming that we got going on here

Eero Tamminen · Post by **Eero Tamminen** » Tue Apr 07, 2015 6:21 pm

VladR wrote:
Eero Tamminen wrote:Implementing profiler in an emulator for these kind of old/small systems is much simpler than on real device, because things done in emulator aren't visible to the emulated system and host systems are nowadays so much more powerful. So profiler doesn't need to be very clever, it can just brute-force things:
I like the fact that you are trying to make it look like it was not a big effort at all. Yeah, right

More advanced features were a lot of effort, but the basic profiling functionality really is pretty simple and not that much of code as emulator already had features & infrastructure that profiler could use.

VladR wrote:What language did you code it in ? C / C++ ? No idea what the hatari's high-level functionality (except for an actual emulation, which must be in ASM, of course - but I'd reckon the higher level processing loop does not have to be in ASM) has been coded in...

Hatari is 100% C-code. Some of the C-code is converted from x86 assembly (Winston's Atari bitplane conversion routines) and from C++ (e.g. Aranym's Videl & DSP emulation code that were used as starting point for Hatari versions).

VladR wrote:Looks like you either have some sort of callback chaining in place or just called your function directly in the Hatari source ?

When debugger is exited, it sets flags on what features are enabled and whether CPU & DSP emulation mainloops should call debugger. Debugger callback then calls relevant debugger features (e.g. profiler and breakpoint) after every instruction.

VladR wrote:Yeah, that one is an easy fix, once you are in the codebase. So, before you enhanced the profiler, did it at least provide the instruction count, or did you also add that yourself ?

Instruction counting is basically just "address_array[pc/2]++" every time emulation calls profiler (through debugger). Program counter is divided by 2 as instructions are in m68k only on even addresses.

VladR wrote:But your post definitely made me re-realize the importance of external profilers. So, when the time comes, I will go write the external tool, as for a small upfront time investment, it will provide an invaluable information on the cycle count / callgraph, automatically upon each build.

Getting useful callgraphs is a lot of work, both for tracking what the emulated code does, and processing that data. If you're interested, open a new thread in the Hatari subforum, discussing that takes more time and this is already out-of-topic for Quake discussion.

dml · Post by **dml** » Tue Apr 07, 2015 6:40 pm

VladR wrote:What PC are you using to generate the lightmaps ? Does it make a big visual impact if you reduce the samples to 32 ?

It's quite an old PC now by 'developer' standards

but still decent. It's a 3ghz quad-core i7 from around 7 years ago. Plenty good for this stuff, which must have been painful on a Pentium III...

32-64 samples is probably enough for my test map, although with some open issues in the rad tool still, I haven't bothered to find a sweet spot for that. It also depends on the amount of haze wanted in the shadow edges.

It wouldn't be difficult to make it adaptively track error below some visible threshold and 'find' the ideal sample count for each point but I would probably spend the time on it only if I actually found myself waiting on the tool to finish without anything else to do - e.g. with bigger/denser maps. If it becomes the case I'll probably do a bit more on that side of it.

For now I mainly need to get more specific context into the BSP nodes to know what kind of thing the ray actually struck...

VladR wrote: What is wrong with the 16-bit color mode ? Not enough colors for smooth gradients ?

The most common issue is the extra green bit - it's useful in some cases but leads to green / violet patchyness in other cases. In fact its positively useful for detailing, but distracting when used in lighting gradients. For the moment I'm using it everywhere equally but likely to drop it from the lightmaps later on. It can be mitigated a bit with dithering but only if the resolution remains fine (i.e. avoiding chunky modes - which I haven't ruled out yet).

dml · Post by **dml** » Wed Apr 08, 2015 3:22 pm

I actually got stuck for a few days with the sun/sky lighting thing and left it alone for a bit. Wasn't getting enough time in one go to get to the bottom of it so progress has been slow.

The actual problem wasn't anything to do with lighting - the q2map BSP tool that I was building from source would not compile my maps without complaining about leaks, even if several other BSP tools seemed to work fine with it. And I *need* to use a tool built from source, because it needs to be modified to assist with sunlight during the lighting step. Using an existing tool doesn't help me here.

So there was clearly something funny with q2map (grrr!)

q2map also contains other fixes/changes I made for sunlight, and supports some features which are needed to debug map building, and it would be annoying to have to involve yet another tool to work around some unknown bug, and risk running into other, different problems and losing some useful features.

Anyway I got hold of the source for one of the other tools (qbsp3) over lunch today and quickly checked file by file for diffs. There were some scary bugfixes on both sides of the fence and some other changes but none of that seemed to help compile my map (grrr!)

I was about to put it down again, and had one of those fortunate accidents, reducing the maximum thread/CPU-core count to 1 while launching one of the tests. Suddenly the map compiles without complaining. hmmm.

Checking the other tools more closely again, I find a tiny hack just after the CPU core detection code which resets the thread count to 1, because... quote from the code:

Code: Select all

    ThreadSetDefault ();
    numthreads = 1;		// multiple threads aren't helping..

.

So the BSP step tries to use multiple CPU cores, but doesn't actually work properly. Forcing it to use a single CPU core works correctly. What a waste of time!

Anyway that problem is gone and the other changes I made allow SKY to be detected with raycasts during the lighting stage, so the sun now can be outside of the skybox without it casting a shadow.

I should be able to get back to the Falcon-oriented bits by the weekend.

VladR · Post by **VladR** » Wed Apr 08, 2015 8:19 pm

While I still haven't kept up with all recent posts, I'll make an exception and react to the latest one:

dml wrote: Checking the other tools more closely again, I find a tiny hack just after the CPU core detection code which resets the thread count to 1, because... quote from the code:
Code: Select all
    ThreadSetDefault ();
    numthreads = 1;		// multiple threads aren't helping..
.

So the BSP step tries to use multiple CPU cores, but doesn't actually work properly. Forcing it to use a single CPU core works correctly. What a waste of time!

We all understand how these multi-core hacks come to life. The production deadline axe hovers over your head, hungry artists loiter in the front lobby screaming for the opportunity of expressing themselves on the torn piece of the digital canvas, methodically detaching their left ear lobe with the razor, but the bloody editor just won't work !

So, you insert your -temporary- hack into the code, honestly believing that you will (of course !) fix it later.

Yeah, right...

Now, if there were few more coders on the tools team, they actually might make the time to fix this.

Now, of course, it's just up to you

dml · Post by **dml** » Wed Apr 08, 2015 8:30 pm

Coder's side story:

The funny thing is, somebody put static variables in the same code that is threaded, and which doesn't work with more than one thread running. And the reason (from comments beside the hack): because the C compiler at the time had a bug and generated faulty code when those vars were defined sensibly...

Fortunately the BSP step takes <1 second so I don't need to bother fixing the threading problems - can spend my time better on the Falcon code. The vis and rad steps work ok with multiple CPUs and those are the bits which take the time.

I have my own BSP builder anyway and might make a Quake compiler out of it later since I know the code a lot better. There are some things in that area I wanted to try for a while.

dml · Post by **dml** » Wed Apr 08, 2015 11:55 pm

I have found that by interfering with the lighting system and modulating the lightmap while it is being built, it's possible to add some interesting effects to the scenery.

The case I was experimenting with: marking concrete procedurally with an irregular grid, based on distance from the lit point to the contour of each face. The visual result is something like multitexturing with a low detail texture, breaking up a tiling pattern so it looks like many more base textures are involved.

Some other simple procedures also provide nice effects - e.g. turbulence for gravel and other less regular surfaces. All for free at runtime of course

There's nothing special about this really from a technical standpoint, except for twisting the lighting pass to do something it's not really meant to do. The best things are sometimes free

DarkLord · Post by **DarkLord** » Thu Apr 09, 2015 12:22 am

And like Robert Frost, only discovered if:

"Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference."

you took an alternate path...

Way to go Doug!

dml · Post by **dml** » Thu Apr 09, 2015 8:36 am

for the poetic context DarkLord!

The last hack I toyed with last night was detecting topological edges vs internal/construction edges, so the lightmap can be marked/darkened at corners while avoiding flat areas. This produces decent looking (although, quite fake) ambient occlusion at edges, completely cost free.

This is not a bad result, because the radiosity is a bit too coarse to do this on its own for detailed surfaces. It deals mainly with inter-surface lighting but only rarely picks out small elevations. It sometimes even causes bugs in those areas due to problems with the tools. Lightmap fiddling appears to work quite well for those fine details.

Granted there are better and far more correct ways to do this stuff now if planned from the start - but part of the fun here is messing with some old constraints and seeing what kinds of tricks can be found within that using the tools available.

Anyway I've stopped with this now and won't return to the maps until the remaining holes are closed in the Falcon engine, for transparency on textures and special surfaces.

DarkLord · Post by **DarkLord** » Thu Apr 09, 2015 11:35 am

It's just my humble opinion Doug, but what you and many other talented Atari programmers
do,is pure poetry.

kristjanga · Post by **kristjanga** » Thu Apr 09, 2015 5:19 pm

What darklord said ^

dml · Post by **dml** » Fri Apr 10, 2015 11:31 am

Based on recent tests, it's clear that collision detection (still implemented in C with floating point) is now the 3rd most significant cost most of the time, even when standing still. The only two items which take longer are polygon scan conversion and drawing.

I don't want to spend too much time optimizing that because it doesn't work 100% properly in the first place, and needs reimplemented. Sliding along angled surfaces can sometimes get you stuck in the walls, and a few other problems.

I will probably do some basic work on this version to get the cost down but won't bother involving any complicated optimizations until the CD algorithm is correct. It seems to be anywhere between 7% and 25% of total time depending on the map and location of the camera which is enough to regularly get in the way even when recording demo videos.

(I could cheat and record/replay the camera action without any CD but I don't see the point in doing that - every version so far has been realtime after all...)

dml · Post by **dml** » Sat Apr 11, 2015 11:36 pm

I have put this project aside for a few days to fix some problems with other projects, and will return to it again after that.

The last progress made was solving the excessive cost of the surface cache preparing surfaces for texturing. I was thinking about threading it (interrupt time slicing) so it would run in the background and play catchup, but this is evil-complicated and doesn't best use up the available dead time on the CPU (i.e. when the DSP is 100% busy and there's nothing else to do anyway).

So I settled on a solution that performs surface cache filling with the CPU while the DSP is running the polygon scan-conversion step, which usually takes a solid 20ms for a reasonably complex scene. Since that time wasn't being used for anything else, any CPU work done in that period is magically 'free'. Better still, I can tell when the DSP is finished scanning, and can stop the surface cache pass at nearly the same time. This ensures that no additional time is wasted. It also has the nice property that more complex scenes take longer to scan - and consequently get more time to prepare surfaces. Nice and balanced!

So the engine is now nearly as fast while the camera is in motion, as when it remains still.

The downside is that newly visible surfaces tend to be flat-filled when the camera moves at speed, as the surface cache tries to play catchup. But I think for a 16mhz machine it's better to keep the framerate up than keeping the detail level up! By reducing the texture detail level the surface cache will catch up more quickly also, which is a nice property for optimizing performance.

troed · Post by **troed** » Sun Apr 12, 2015 5:46 am

dml wrote:So the engine is now nearly as fast while the camera is in motion, as when it remains still.

Impressive!

VladR · Post by **VladR** » Wed Apr 15, 2015 2:01 pm

dml wrote:I do want to get transparency working again using the latest version and then will start on a vid.

How exactly are you approaching the transparency ? Just a basic brute-force - e.g. per-texel condition ? I am assuming here you aren't talking about alpha blending - merely on/off, correct ?

When I was working on my own CPU implementation of transparency on Jaguar (it has the HW for that, of course, I merely did this for an exercise), I realized that for animated transparent sprites, you can reduce the amount of checks using additional LUTs by ~60%. That's probably not gonna work with all transparent materials - but it could work for things like transparent windows and such, where long stretches of textures are transparent.

shoggoth wrote:
VladR wrote:How much data cache does Falcon have, again ? As for the FP-emu, as long as there is a reliable and known coefficient (for the given machine), there is no problem with the difference.
http://en.wikipedia.org/wiki/Motorola_68030

Thanks for the link. It refreshed quite a few things in my memory.

VladR · Post by **VladR** » Wed Apr 15, 2015 2:19 pm

dml wrote:I have found that by interfering with the lighting system and modulating the lightmap while it is being built, it's possible to add some interesting effects to the scenery.

The case I was experimenting with: marking concrete procedurally with an irregular grid, based on distance from the lit point to the contour of each face. The visual result is something like multitexturing with a low detail texture, breaking up a tiling pattern so it looks like many more base textures are involved.

Some other simple procedures also provide nice effects - e.g. turbulence for gravel and other less regular surfaces. All for free at runtime of course

I see you, too, are a supporter of procedural texture generation and appreciate the appeal of the fact that the code to generate 16 different variants of a texture has smaller memory footprint than half of one texture

kkrieger on Atari is completely realistic

I could be wrong here, but I though that lightmap texel density, per Quake's world square meter varied a lot across whole map. That might magnify the lower resolution discrepancy between neighboring faces sharing the same material (hence, the same procedural detail), wouldn't it ?

dml wrote:There's nothing special about this really from a technical standpoint, except for twisting the lighting pass to do something it's not really meant to do. The best things are sometimes free

Well, we don't really know if Id had some additional plans with the lighting pass. I'm pretty sure JC had few ideas (colored lighting being the very first here), this is a very low-hanging fruit after all (from coder's standpoint, at least - it's easy to change the code to support more lights - you can do it in a weekend, but to build all levels like that is 3 orders of magnitude more work on the level design side). I suspect they had to draw the line somewhere and say - 'enough - ship it', or they would be improving the tech forever...

And this is where the genius of JC is - he always knew where to draw that line, where to stop improving, as by the time everyone else caught up, they were already a generation in advance in their prototypes at E3...

dml · Post by **dml** » Wed Apr 15, 2015 2:28 pm

Hi!

VladR wrote:
dml wrote:I do want to get transparency working again using the latest version and then will start on a vid.
How exactly are you approaching the transparency ? Just a basic brute-force - e.g. per-texel condition ? I am assuming here you aren't talking about alpha blending - merely on/off, correct ?

It depends a lot on the situation - I'm dealing mainly with scenery transparency for Quake maps, where all pixels in a given surface are written but they are translucent, so must be combined with the pixels below.

The the method used for the Doom derivative was different because it focused mostly on masked areas for sprites and walls with holes (conditional writing) with optional translucency for the written pixels. In most cases those surfaces were just 'cut outs' and didn't use translucency though - so it was primarily conditional writing. The few cases that used translucency were things like skulls, plasma bolts and glass walls.

Doom transparency:

For the 'conditional writing' case, I developed two drawing paths. One just used brute-force pixel testing, and was used for tiny, distant objects because the setup time was minimal versus a small area. The fillrate trade was better for small items.

For close-up sprites and walls with holes at mid-near distances, I used a second technique which encoded the objects into spans with a kind of 'shortcut map' that indicated how many texture-space pixels remained for a given u,v, so posts could be rendered contiguously and gaps skipped using some fixedpoint increments. This has more costly setup per item (and per gap encountered), but much better fillrate since there is no pixel testing at all.

Quake transparency:

The transparency for the Quake project is simpler than the Doom version at a pixel level because there aren't really any holes to deal with - most transparent pixels have some value needing combined with the background. For the holes that do exist, there are too many 'gaps' to bother trying to skip over them due to overheads involved with that. So every pixel is treated the same.

However the complexity involved with Quake transparency is not at the pixel level - it's a the surface level. Quake engines use a zBuffer to write transparent surfaces after the solid scenery surfaces have been written (and zbuffer updated). This means the scenery surface scanconversion algorithm need only deal with a single frontsurface for each pixel. Transparent surfaces are not involved in that.

The Falcon engine has no zbuffer at all, because it is expensive to generate and write. In fact Quake & Quake 2 write the zbuffer pixels as a separate step from colour pixels. That's too much extra work in multiple areas (setup time, bus bandwidth) for this machine to cope with.

So this engine uses a different type of spanbuffer which can handle arbitrary depth complexity - multiple frontsurfaces - so the transparent surfaces get clipped along with the solid faces as if they have been written with z-testing.

So when I'm referring to transparency work for this engine I'm mostly talking about the spanbuffer & depth clipping, as opposed to how the pixels get drawn.

VladR wrote: When I was working on my own CPU implementation of transparency on Jaguar (it has the HW for that, of course, I merely did this for an exercise), I realized that for animated transparent sprites, you can reduce the amount of checks using additional LUTs by ~60%. That's probably not gonna work with all transparent materials - but it could work for things like transparent windows and such, where long stretches of textures are transparent.

Yes LUTs are very useful here but is most effective with an 8bit framebuffer. Both the Doom and Quake projects on Falcon have a 16bit framebuffer so unfortunately LUTs were a bit more costly (but still necessary) for combining colours in some shaders.

It is possible to use the DSP to synchronously combine RGBs but the overhead of exchanging the rgb words over the host port in both directions per pixel reduces its effectiveness a lot...

VladR · Post by **VladR** » Wed Apr 15, 2015 2:29 pm

dml wrote:The last hack I toyed with last night was detecting topological edges vs internal/construction edges, so the lightmap can be marked/darkened at corners while avoiding flat areas. This produces decent looking (although, quite fake) ambient occlusion at edges, completely cost free.

This technique was used in quite a lot of more modern games, actually. I believe it was still used even about 7 years ago.

Even though the games already had some normal-mapping implemented, the amount of VRAM on gfx cards made it a very low-hanging fruit, especially from performance-cost standpoint. I forgot the names of those games, but I clearly recall all of the surfaces in the game (even the small / narrow ones on pillars) to have that lightmap applied on it, hence each surface had an additional 'contrast' / AO visual element in it.

I don't know how much RAM for lightmaps you still have available, but I suppose you merely adjust the same lightmap that is already being used for the given surface - hence no additional memory footprint, correct ?

That could be quite a nice visual boost, if applied across whole map.

dml · Post by **dml** » Wed Apr 15, 2015 2:52 pm

VladR wrote:I see you, too, are a supporter of procedural texture generation and appreciate the appeal of the fact that the code to generate 16 different variants of a texture has smaller memory footprint than half of one texture

I've been a fan of procedural generation since I first saw it in the 1980s

I have worked on procedural stuff in several past projects, and one (unfinished) was almost entirely procedural:

https://www.youtube.com/watch?v=ghEaNY-IZhM

I like it because you can do a lot yourself with only your imagination, skill and time as the bottlenecks. Although it can take a lot of time.

VladR wrote: I could be wrong here, but I though that lightmap texel density, per Quake's world square meter varied a lot across whole map. That might magnify the lower resolution discrepancy between neighboring faces sharing the same material (hence, the same procedural detail), wouldn't it ?

Yes it is quite chunky, but smaller than that - IIRC a luxel is about 30cm wide and that looks about right by eye. (4 units = 3 inches, with 16 units per luxel by default with no scaling. 4x3=12 inches or 30cm).

VladR wrote: Well, we don't really know if Id had some additional plans with the lighting pass. I'm pretty sure JC had few ideas (colored lighting being the very first here), this is a very low-hanging fruit after all (from coder's standpoint, at least - it's easy to change the code to support more lights - you can do it in a weekend, but to build all levels like that is 3 orders of magnitude more work on the level design side). I suspect they had to draw the line somewhere and say - 'enough - ship it', or they would be improving the tech forever...

Well I can't pretend to know what they did/didn't think of back then, but I saw some guys use a more arduous method to do something similar to base textures using the Q3 tools more recently. Lightmaps just looked more convenient to me, if a bit less precise

But then, I did have to mod the tools to do it, whereas they were working with unmodified tools...

VladR wrote: And this is where the genius of JC is - he always knew where to draw that line, where to stop improving, as by the time everyone else caught up, they were already a generation in advance in their prototypes at E3...

Yes that's very true. He knew when to move on (...or when he'd had enough, or a new distraction was already on the table

)

Atari-Forum

Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030

Re: Quake 2 on Falcon030