Hi all. As promised I find some time to share technics I find to reach the 312 sprite record. I think it's better to
explain how things were discovered.
First of all, thanks to Scum of the Earth and Phantom to beat my previous record. Without it, I simply *never* spend
energy to beat my own record. That's the main point of "ultra-optimisation" : you can't do it if you're alone
because you can't fight with other people ideas.
When SOTE and Phantom released 269 and 270, I was a bit curious how they made it, as I was sure my 268 was not
. As I don't see some big difference in Phantom code, I guessed the difference came in the data. Back to
holiday, I started to study the Phantom screen under SainT debugger. ( you can't imagine what a motived coder can do
under saint debugger
Displaying CPU time, drawing the CLS map etc...). By displaying the CLS map I discovered
that the area was a little smaller than mine. Fastly, it appairs my pc generator was not very good at finding CLS
area. In fact, I supposed I wanted to clear the complete screen. In fact, you don't need to clear areas where new
sprite will be draw. If S is a 32bits on screen and A is the 32bits "OR" value, don't clear if S OR A = A. I fastly
change my builder and reached 272.
As I modify my old generator, I noticed a part where I wrote a comment in the source code "empirical search, should
improve this". Humm, I worked on that problem again and I find a good solution using brute force. The problem is to
find the best order to draw OR sprites ( after drawing MOVE ) to remove a maximum of OR. Applying that new idea
using brute force search, I reached 278. Everything WITHOUT changing anything in the runtime ST player ! ( only the
PC data builfder)
Then, I was about to release my 278 ( record was 271 by Phantom ). The same day, before I release, Phantom published
280 !! Gosh, what's new idea ??
In the Phantom 280 record, he spoke about an idea from SOTE to use MOVE.L for "move" sprite. To enderstand the idea,
you just slow down a bit the sprite display, but you speedup a bit the CLS. Btw, I looked at Phantom code in details
and find that he used more sprite data than me ! how was it possible ? I looked deeply in its sprite code and find a
new technic: he has more than 16 sprites routines. He used several sprite routines, some for 16pixels height sprite,
some for 12, 10 etc...
I though about that method long time ago, but I never tryed it because I was pretty sure that it will required too
much memory. So how did he solved the memory problem ?
Phantom uses two technics to preserve memory:
1) some of its CLS data are stored into 8bits instead of 16. The main drawback is that he needs to "unpack" these
data to 16bits in real time, and it takes lot of CPU.
2) He uses a "world shortest" scroll-text ever:-)
Now let's resume. I have 278 sprites. Phantom has 280. He uses the "MOVE.L" technic and the "multi-sprite-routine"
technic. As I don't use any of these technics, I was sure my databuilder were far better than him ( because I was
only 2 sprites bellow).
I firstly tryed to implement the MOVE.L technic, so I decided to completly re-write my sprite generator. Now I have
a clean and scalable C++ code so I can test many options. with MOVE.L , I only reached 281 ( 3 sprites more than my
Now I was sure my builder was very powerfull, because I get the record (281) *without* using multi-sprite routines.
I know "multi-sprite" method is VERY fast, but use a lot of memory.
To splash the record, I had to find a way to get memory ! I don't wanted to use Phantom method because a) it takes
CPU time b) it's not my own idea
The idea !
The idea came 16.03.2005, my birthday date
As I don't find any usefull ideas, I decide to create a text file
containing a typical CLS frame, in hexadecimal. I often noticed that eyes connected to brain are great organs to
And very often, you have to "see" datas to get news ideas. Here is the sample text file:
Since my 268 record, everybody use the same CLS routine: various JMP in a 16pixels vertical block clearing routine.
For each block, you have to record its screen position (16bits) and a number of line (16bits, because it's a JMP
adress directly). Phantom used 8bits for the second data, so he has to convert it into adress in real time.
Looking at the text file upper, I noticed there is plenty blocks of 1,2,3 or 4 lines high. So came the idea: Why not
SORTING block per size ?? When sorted, if I have 12 blocks of one line each, I simply store "12" and then 12 screen
positions: 2+12*2 =26bytes , instead of 48bytes for old method, or 36bytes for Phantom method.
Better, to draw the 12 blocks of 1 line, I simply use DBF (loop) instruction instead of JMP, wich is 4 cycles better
!! ( move.w (an)+,an and jmp(an) in the old method)
That is the main idea : I get 47Kb of memory, and the routine is a bit FASTER !!!
Then I fastly code a brute force research for all different sprite routines, and that is: 312 !!
The main constraint now is only the 512Kb limitation. People finding new method to get memory will be the future
record owner. I'm sure 312 can be beaten with these new methods ( I was not sure about it for my 268 record). Now I
have plenty of "empirical" choice in my databuilder, and I think I can easyly get 2 or 3 sprites more by carefully
finetuning all generation parameters.
And if someone find a new idea, I'm sure we can put more and more sprites !
Ultra-optimisation rules !!