Some coding notes today.
Just quickly tested a (probably) better way to get subtexel accurate texturing with generated code spans. Idea seems to work, but no drawing routines written yet. Just playing with the generator.
The idea is to allow the codegen pass to search for 'approximate hits' for a given number of subtexel offset fraction bits, while generating the code for the span. It may find decent approximations at random points along the span, depending on the rate of the span. These get logged as they are found in an indexing table.
For spans where a good approximation can't be found for all offsets within the length of the span, a unique span can be generated at small extra cost. This turns out to be very rare. For cases where hits are found within the span length, the span length is adjusted upwards by the start position of the last hit (so all N offsets can complete a span as before).
The key to making it work is allowing a flexible amount of error which is rate-dependent, plus recording the amount of offset compensation needed to begin rendering from a given offset into the generated code. You only need to store a small table for this because there are a limited number of offsets being searched (currently 4, but maybe increased to 16 when it's working).
All spans need to generate more code to do this, but offsets are commonly found early.
I'm trying to calculate the cost of storing the extra generated code - but given that the majority of span rates can fill all 4 subtexel slots within the first 20 pixels, the average cost is something like 15-20% more than without subtexel... (versus 400% for the simpler solution - generating unique code for every rate AND subtexel offset).
You can save some more space by allowing -ve as well as +ve approximations but this adds some complications - overrunning the start of the texture row. This is bad if the textures are not padded and your c2p routine accepts only pre-shifted pixels, as mine does (crash!).
Overall, using this should (A) speed up wall column drawing because only one call is needed per column to achieve subtexel (instead of splitting the column at the horizon / zero fraction)... and (B) takes less space than generating spans in both directions (upper/lower spans). It is more fiddly to dispatch a call though, and requires storing the subtexel fraction in the column itself, which I previously managed to avoid.
So that's it. Somebody might find this route interesting to play with.
[EDIT]
Some output from the generator, showing texture U stepping rate for each generated span, and the 4 offsets which produce approximate hits for 0.25 texel offsets. The vast majority of stepping rates get solved within about 12 steps, which costs basically nothing. The occasional spike occurs, but doesn't really impact storage. The error tolerance is 1/4 the texel rate, which seems to be good enough in most cases but could probably be reduced.
Code: Select all
rate: [00001780]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00004680] i:3 c:3
subtexel [00008000] found >> match: a:[00007580] i:5 c:5
subtexel [0000c000] found >> match: a:[0000bc00] i:8 c:8
rate: [00001800]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00004800] i:3 c:3
subtexel [00008000] found >> match: a:[00007800] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c000] i:8 c:8
rate: [00001880]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003100] i:2 c:2
subtexel [00008000] found >> match: a:[00007a80] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c400] i:8 c:8
rate: [00001900]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003200] i:2 c:2
subtexel [00008000] found >> match: a:[00007d00] i:5 c:5
subtexel [0000c000] found >> match: a:[0000c800] i:8 c:8
rate: [00001980]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003300] i:2 c:2
subtexel [00008000] found >> match: a:[00007f80] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b280] i:7 c:7
rate: [00001a00]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003400] i:2 c:2
subtexel [00008000] found >> match: a:[00008200] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b600] i:7 c:7
rate: [00001a80]
subtexel [00000000] found >> match: a:[00000000] i:0 c:0
subtexel [00004000] found >> match: a:[00003500] i:2 c:2
subtexel [00008000] found >> match: a:[00008480] i:5 c:5
subtexel [0000c000] found >> match: a:[0000b980] i:7 c:7
a: is the texture U accumulator at that pixel offset
i: is the pixel offset
c: is the amount of offset compensation needed (currently always == offset, because only +ve approximations have been allowed in this run)
I already manage to keep the number of generated rates really low, and precision really high via mipmapping. The texel rate (for the chosen mipbias) is always bounded in the range between 0.5 and 2.0 so nearly all of the generated rates are spent on precision. That's a triple-win.
