Thanks for your suggestion. Doug also gave me a similar advice and tried it but unfortunately there's actually no speed advantage on a real Falcon.
Yep - having seen how the drawing works it's hard to imagine the instruction cache will help - the generated code replaces data fetches with instruction fetches and there is even some compression (move.l dx,...) and zero-cost skipping. There is no practical way to make cached data fetches beat that, and in most cases using the instruction cache means some dependence on the data cache (small generic code).
There *is* a way to make the codegeneration faster - in theory - but it is even more complicated to use and the gain would be highly subject to reuse of what is being drawn.
This would involve finding 'common' pieces of sprites - especially 'repeat drawn' sprites - and encoding only small portions of a few pixels as 'generated' code, instead of the whole sprite. The instruction pattern can then be pre-touched (if necessary) and locked to keep the code in the cache if cache space is slightly exceeded, to prevent circular thrashing. An outer state manager (sprites sorted by id, or sprite 'pieces' sorted by id) then draws one common part at multiple destinations using the pre-touched instruction pattern.
This would mean mainly writes occupy the bus - no instruction or data fetches - for some fraction of drawing activity. Each new pattern will incur instruction fetching, but subsequent clones will be amortized.
This is a horrible
thing to implement, and useless if you don't have at least 3 (maybe more!) copies of most patterns reused on the screen. OTOH, if the 'reuse' factor rises as the performance problems rise (lets say the same 10 types of sprite are used but 30 are drawn vs the original 10) then the advantage might be interesting.
It would be a lot of work just to find out that there isn't enough sprite reuse - and I'm fairly sure Anima has a pretty good idea of the level of reuse in the game. He'll know whether it's worth going so much trouble - and given the number of sprite types involved and the fact they rotate etc - it's seems likely to be a dead end.
Chances are it's not going to get very much faster without changing the content of the game, or finding a direct way to omit some of the drawing. But I'll be interested to see what ideas surface on this.