calimero wrote:what are dominant instructions in R_RecursiveWorldNode4PL_68k?
Fortunately for us, multiplies
In fact I am not using the original algorithm from Q2 because it was way too heavy for the Falcon's CPU. I factored out anything which was not 'core' to the algorithm as separate passes, so only the core part is left and this helped speed it up a lot already.
It is fortunate that there was so much stuff in there which could be made sequential/serialized and therefore decoupled from the BSP algorithm.
This is mainly how I got it inside the 256-byte instruction cache. Would not be possible without that refactoring work (ignoring DSP, for now).
calimero wrote:
and what is basic algorithm for this?
Fundamentally it is a recursive BSP descent algorithm, but it also incorporates frustum culling for geometry nodes and some other stuff. There are important details which affect us - much more than they affect PCs in recent years. I'll explain why below.
calimero wrote:
I ask this because I stump on comment at
http://www.celephais.net/board/view_thr ... 01&end=125 that say that "RecursiveWorldNode" could be skipped

(not sure what original function does but he propose to "build a list of surfaces visible in the current pvs each time the view leaf changes")
Yes - and that is pretty much true, on a PC with a GPU / a fast hardware path for geometry and clipping. It has been true for some years. I did the same test with the code on PC before I did any Falcon bits - just drawing the PVS groups directly.
Stepping back a bit - before going into details - I picked Q2 as a target for a few reasons. One of the reasons being that it is kind of the 'last' new engine to really put effort into software rasterisation. A lot of the complexity in the map representation and rendering is aimed at software rasterisation. So much so, that it actually hurts hardware rendering performance.
It is not a coincidence that Q3 dropped software rasterisation and started supporting things like curves - and I didn't bother attacking it for Falcon

while it may be doable, it is far less likely to produce good results because no compromises were made to assist software (except, perhaps by accident or to save dev time - but not by intent).
The PVS is responsible for finding out what can probably be seen, from where you are standing. It doesn't care what direction you are looking in - it is a static lookup based on your position in 3D space. (The BSP assists with this PVS lookup, but as a separate task from its drawing responsibilities - its just used as a quick 3D search).
There is a separate step responsible for further narrowing what can be seen, in the direction you are actually looking. That is the frustum culling step. It is rolled into the RecursiveWorldNode pass. GPUs are fast enough that this is basically a waste of time on a modern PC - it takes less time to just chuck whole meshes at the GPU and let it sort out what is inside or outside of the viewport (for this kind of content anyway).
So the first useful thing RecursiveWorldNode does, is provide viewport culling for software rasteriser. It's actually quite a nice technique because it is smart enough to turn off clipping planes which don't contribute early in the BSP walk, so they dont hurt performance deeper in the walk for lots of smaller meshes.
GPUs are equipped with ZBuffer, so for the most part its possible to chuck unsorted meshes at hardware and have it draw correctly. Transparency is a bit more complicated but not that much. The software rasteriser in Q2 was equipped with a ZBuffer too, but not for sorting the scenery - that was used only for sorting fine meshes/objects with a small fill area because ZBuffer testing pixels in software is very slow.
So the second thing RecursiveWorldNode does, is generate integer depth sorting keys for scenery (map) polygons. These sort keys are used by the software scanconversion step to handle occlusion and guarantee zero overdraw in the framebuffer. Without the BSP sort keys, the only alternative is to sort individual spans by their nearest z value. Some engines do this, but the z values need to be precise, and typically floating point - which we want to avoid. It also leads to drawing errors because it's a numerical problem, not a logical problem (the BSP is mostly logical).
It is notable that Abrash mentions this in his Quake 1 tech walkthrough - sorting spans by z causes a lot of additional correctness problems, and was abandoned. It will be worse if we try to do that with integers.
The third thing RecursiveWorldNode provides, is a very nice way to perform back face removal using the BSP, without testing every face. Again hardware will do this for free, but there is a cost for a software rasteriser, per individual face. Each BSP hyperplane can host many sub-faces, and each hyperplane is already tested in order to generate the sort keys and perform culling - the same test can be used to eliminate all faces opposing the active hyperplane side.
I built a further optimisation on top of that - by reorganising faces within each node into two groups (front,back) and 'blitting' the group instead of checking each face's orientation code. This provided a significant speedup on Falcon versus the original code I started with.
So for our machine, sidestepping the RecursiveWorldNode function creates additional correctness problems and loses some nice optimisations which just push per-face costs everywhere else. This is mainly because RecursiveWorldNode was designed to help accelerate a software rasteriser which has a high overhead per face, per vertex and no ZBuffer. These are redundant on modern hardware, so sidestepping it is a better optimisation.
So that's why the Falcon version will keep RWN and i'll be trying to split it between CPU and DSP - it does too many useful things to give up
