
Thanks for reading.
I think I figured out a way to chop away another 20% CPU time.
There are several steps which need to get done on the queues emitted from the BSP tree traversal, which comprises the visible geometry in the scene on each frame. The BSP tree traversal does an efficient job of performing an intersection-style set operation between the static PVS (what can be seen from the player's position, regardless of view direction) and the dynamic viewcone (what can be seen in the direction the camera is looking).
So the queue of work being emitted from the BSP every frame is a narrow subset of the whole map - only whats in front of the camera, and not behind some big obstacle - but it's still quite a lot of stuff.
This queue of stuff is made up of leaf meshes which break down into polygons, edges and vertices. These are linked to each other using either pointers or indices - both of them able to reach across the whole pool of RAM. Unfortunately this stuff needs to be sent to the DSP, and there's no room there to store all the geometry from the whole map, so the pointers and indices are useless. The DSP can't reach vertex #14205 if it only has room for 1000 vertices in the first place.
So one of the things done in this queue is reindexing the vertices and edges, linked off the visible faces. The reindexing compacts the links so the largest index is based on the number actually onscreen (instead of the number in the whole map, which is huge). This lets everything fit on the DSP for one frame.
The reason indexing is needed at all (rather than just send the geometry raw) is the amount of sharing going on - vertices are shared by edges where they connect, and edges are shared by polygons where they join. Sharing means far fewer vertices need processed in total and that's a significant cost on its own.
But this reindexing needs to be done on the whole queue every frame, and costs about 20% total time. This is pretty annoying. It might be possible later to get the DSP to reindex stuff as it is sent over but it involves sending over the 'real names' for the vertices when they get rotated/projected etc. so things tie up properly later, and the whole thing is pretty complicated. Trying to avoid.
However I noticed that I had set things up in such a way that these work queues coming out of the BSP tree are self-contained and always complete, even if the BSP traversal is paused. So deliberately stopping the BSP while walking around just leaves you with a room which doesn't extend through the corridor when you walk through the door. The geometry gets frozen and no new visible stuff gets added for the next frame. But the view still updates and you can still move the camera around because the queue is still being transformed and drawn. It's just not being edited.
Anyway I figured a neat way to cut the cost of the reindexing (and maybe some other stages), is to figure out if the BSP actually produced anything different between subsequent frames. Use a revision counter, hash or something to track pieces of the map which turned on since the previous frame, and force the reindexing stage to be refreshed only then.
This would mean waving the camera around causes occasional extra work as bits of map move into view, which would be every few frames (since most map pieces are quite big) instead of exactly every frame.
I haven't tried a full version of this yet - only a hack to pause reindexing - but I think it should work. It certainly can't cost more than it does now, and must cost less at least most of the time. It doesn't even need done during the BSP algorithm, it can be done as part of processing the top level queue of leaf meshes which get emitted, before it deals with faces.
I'll need to do a test and see if its going to work.