I again stumbled about 2 interesting things that could be interesting here.
First I discovered that the good old trick on 68000 replacing
also saves 2 cycles on 68020/30. This saved 15% on all my old 030 texturemap innerloops. I´m wondering why no one did that in the old days, including me...
I dont think this is practical for texturemapping in big engines like Doom or Quake, because it needs the data to be aligned to 64k boundaries. And you dont want to waste all the memory. Also 16 bit truecolor mapping is kind of a problem there, because you loose the nice dn*2 feature. But probably this works in other areas like transform, culling, bsp-traveral, ...
Another thing on my mind is, that in the old days a lot of rumours were flying around that some falcon democoders used the blitter to transfer data between dsp and memory to achieve 3.5 mb/s transfer rate.
Not sure if that is an urban legend. I did some fast test on this using Hatari 1.9, since I dont have access to a real Falcon 030.
But the test results didn´t look good.
In Hog_bus-mode the transfer was roundabout at the expected speed, but didn´t work (It was always transfering the same word over and over again, as if the dsp is blocked away from the bus). And after some time the dsp even locked up.
Without Hog_bus-mode the transfer more or less works, but is very slow (3 times slower than an unrolled CPU-loop).
I guess Hatari is massively differing from a real Falcon in these edge cases.
Would be interesting to know if such a hack really works on a real Falcon and if Dougs engine would profit from it.