The DSP is now transforming and projecting the scene as fast as the CPU can send the x,y,z coordinates - that's like 'free hardware 3D'
, so while there is still a lot of room to optimize the DSP code there isn't any benefit in doing so until the CPU side gets work. I've left that side of things alone for now.
I have also figured out a way to avoid sending all of the scenery every frame. However once again it's not worth doing if the the CPU can't figure that out faster than it can copy the data. It's best done as vertex groups and therefore probably map clusters, to make it worthwhile.
I'm working on the 2D clipper now, which is quite lengthy and hard to debug. From what I can see Quake clips faces 'properly' with dotproducts against frustum planes. It's simple, compact and eliminates vertices and edges early, before projection.
I'm wary of using this on the DSP with integer math because it relies a lot of precision, with failures causing edges to wander off the screen. This gets worse for near z's and the amount of wander could be significant - enough to corrupt memory or other badness. It also needs a margin around the display for small errors and this doesn't work at all if you want to shrink the window smaller than the display.
That approach is just a potential a nightmare of bugs and weirdness, so I'm going to sidestep it completely and clip edges in 2D, after projection. Once that works, a full 3D clipper can always be tried with a working 2D version to compare against. I have no idea what will be faster because 2D clipping is trivial on the DSP, the amount of projection worked saved by a 3D clipper is moderate to small, and winding clipper using dotproducts means having the DSP deal with higher representations of faces (sending a bit more data). So it's not worth guessing what works best at this point. I'll try the reliable one first, and then perhaps the other one later.
The clipper will probably be a bit of a bottleneck for inserting edges whatever is used, but it can be optimized in various ways and still probably an order of magnitude less expensive than the next stage needing solved.
All of the stages so far are resolution-independent. The amount of work does not shrink as you attempt to reduce the size of the window being drawn. It is all processed as coordinates and adjacency information. This is one of the reasons I have attacked it first, and putting some effort into estimating how much its impact can be reduced - because once that limit is found it will set the upper bound for the framerate - in any drawing mode or at any window size. If it is found that most maps produce too much work of this kind, it impacts the feasibility of everything else and any decisions made about what to do with it next.
Nearly all of the stages which follow will be resolution-dependent, and costs will change as the window is reduced. This is obviously nice because it's a last-resort band-aid on the second stages not being fast enough
Ignorning textures for now, there are two obvious 'serious bottlenecks' which will need overcome on a stock Falcon.
1) Edge insertion to the spanbuffer. This is a linear linked list insertion-sort in Quake (IIRC-need to re-read it!), and that is going to hurt because it relies on PVS and clipping to dodge unwanted insertions and hidden surface can't be done until after insertion of all potentially visible edges.
2) Span drawing by CPU.
I have had a tested solution for (1) since around 1997, so that should be ok
. (2) is a matter of writing very fast 68k and crossing fingers.