New thread about 16*16 sprite record

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Thierry Kormann
Atari nerd
Atari nerd
Posts: 45
Joined: Sun Oct 10, 2004 3:10 pm
Location: France

Post by Thierry Kormann »

leonard wrote:Ok here is the new blasting 16*16 sprite record ! I tryed to add some design to please a large audience :-)
Congratulations Leonard. Once again, an impressive work! and nice design BTW...
Demon from the Conceptors
User avatar
Zorro 2
Administrator
Administrator
Posts: 2281
Joined: Tue May 21, 2002 12:44 pm
Location: Saint Cloud (France)
Contact:

Re: plop

Post by Zorro 2 »

C-Rem wrote:it's cool to see great coders working on a record ... but as i'm graphician , i prefer multiparts ...

i dream of a french multipart demo ...

We have : excellent active coders :

- Leonard ... Pht ... Chuck ... STghost ... Frost242 ... tobe ... gloky ... GT-Turbo ... etc

- Musicians : Dma-sc ... Floopy ... etc

- Graphicians : Exocet ... Mic ... STS ... EDO ... etc

well ... it's just an old dream
I'm remember too, old dream, an mjj french multipart, since 2003, there's still screens about some mjj "coders"...

You can remove GT-Turbo on your proper list, he codes only for Falcon/STE games...
.~^ Member of NoExtra Team :: Member of HMD :: Pouet :: DemoZoo :: T-shirt Sponsor ^~.
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

Hi all. As promised I find some time to share technics I find to reach the 312 sprite record. I think it's better to

explain how things were discovered.

First of all, thanks to Scum of the Earth and Phantom to beat my previous record. Without it, I simply *never* spend

energy to beat my own record. That's the main point of "ultra-optimisation" : you can't do it if you're alone

because you can't fight with other people ideas.

When SOTE and Phantom released 269 and 270, I was a bit curious how they made it, as I was sure my 268 was not

beatable :-). As I don't see some big difference in Phantom code, I guessed the difference came in the data. Back to

holiday, I started to study the Phantom screen under SainT debugger. ( you can't imagine what a motived coder can do

under saint debugger :-) Displaying CPU time, drawing the CLS map etc...). By displaying the CLS map I discovered

that the area was a little smaller than mine. Fastly, it appairs my pc generator was not very good at finding CLS

area. In fact, I supposed I wanted to clear the complete screen. In fact, you don't need to clear areas where new

sprite will be draw. If S is a 32bits on screen and A is the 32bits "OR" value, don't clear if S OR A = A. I fastly

change my builder and reached 272.

As I modify my old generator, I noticed a part where I wrote a comment in the source code "empirical search, should

improve this". Humm, I worked on that problem again and I find a good solution using brute force. The problem is to

find the best order to draw OR sprites ( after drawing MOVE ) to remove a maximum of OR. Applying that new idea

using brute force search, I reached 278. Everything WITHOUT changing anything in the runtime ST player ! ( only the

PC data builfder)

Then, I was about to release my 278 ( record was 271 by Phantom ). The same day, before I release, Phantom published

280 !! Gosh, what's new idea ??

In the Phantom 280 record, he spoke about an idea from SOTE to use MOVE.L for "move" sprite. To enderstand the idea,

you just slow down a bit the sprite display, but you speedup a bit the CLS. Btw, I looked at Phantom code in details

and find that he used more sprite data than me ! how was it possible ? I looked deeply in its sprite code and find a

new technic: he has more than 16 sprites routines. He used several sprite routines, some for 16pixels height sprite,

some for 12, 10 etc...

I though about that method long time ago, but I never tryed it because I was pretty sure that it will required too

much memory. So how did he solved the memory problem ?

Phantom uses two technics to preserve memory:
1) some of its CLS data are stored into 8bits instead of 16. The main drawback is that he needs to "unpack" these

data to 16bits in real time, and it takes lot of CPU.
2) He uses a "world shortest" scroll-text ever:-)

----------------------------------------

Now let's resume. I have 278 sprites. Phantom has 280. He uses the "MOVE.L" technic and the "multi-sprite-routine"

technic. As I don't use any of these technics, I was sure my databuilder were far better than him ( because I was

only 2 sprites bellow).

I firstly tryed to implement the MOVE.L technic, so I decided to completly re-write my sprite generator. Now I have

a clean and scalable C++ code so I can test many options. with MOVE.L , I only reached 281 ( 3 sprites more than my

previous 278)

Now I was sure my builder was very powerfull, because I get the record (281) *without* using multi-sprite routines.

I know "multi-sprite" method is VERY fast, but use a lot of memory.

To splash the record, I had to find a way to get memory ! I don't wanted to use Phantom method because a) it takes

CPU time b) it's not my own idea :-).

The idea !
----------

The idea came 16.03.2005, my birthday date :-) As I don't find any usefull ideas, I decide to create a text file

containing a typical CLS frame, in hexadecimal. I often noticed that eyes connected to brain are great organs to

find ideas :-) And very often, you have to "see" datas to get news ideas. Here is the sample text file:

http://leonard.oxg.free.fr/record16/clstest.txt

Since my 268 record, everybody use the same CLS routine: various JMP in a 16pixels vertical block clearing routine.

For each block, you have to record its screen position (16bits) and a number of line (16bits, because it's a JMP

adress directly). Phantom used 8bits for the second data, so he has to convert it into adress in real time.

Looking at the text file upper, I noticed there is plenty blocks of 1,2,3 or 4 lines high. So came the idea: Why not

SORTING block per size ?? When sorted, if I have 12 blocks of one line each, I simply store "12" and then 12 screen

positions: 2+12*2 =26bytes , instead of 48bytes for old method, or 36bytes for Phantom method.

Better, to draw the 12 blocks of 1 line, I simply use DBF (loop) instruction instead of JMP, wich is 4 cycles better

!! ( move.w (an)+,an and jmp(an) in the old method)

That is the main idea : I get 47Kb of memory, and the routine is a bit FASTER !!!

Then I fastly code a brute force research for all different sprite routines, and that is: 312 !!


The future
----------

The main constraint now is only the 512Kb limitation. People finding new method to get memory will be the future

record owner. I'm sure 312 can be beaten with these new methods ( I was not sure about it for my 268 record). Now I

have plenty of "empirical" choice in my databuilder, and I think I can easyly get 2 or 3 sprites more by carefully

finetuning all generation parameters.

And if someone find a new idea, I'm sure we can put more and more sprites !

Ultra-optimisation rules !!

Leonard/Oxygene.
Leonard/OXYGENE.
p01
Captain Atari
Captain Atari
Posts: 158
Joined: Mon Nov 22, 2004 1:27 am
Location: Oslo, Norway
Contact:

Post by p01 »

Thank you for sharing your explorations.
Image
pht
Atari freak
Atari freak
Posts: 55
Joined: Mon Aug 30, 2004 10:30 am
Contact:

Post by pht »

Thanks a lot for this code analysis !
By displaying the CLS map I discovered that the area was a little smaller than mine... If S is a 32bits on screen and A is the 32bits "OR" value, don't clear if S OR A = A
After your idea of clearing with 'MOVE display', I thought with 'OR display', all screen does not need to be clear.
I don't wanted to use Phantom method because a) it takes CPU time...
In fact, it takes no real needed CPU time : my whole code is in VBL vector $70.w and my "unpacked" buffer is created during the remaining time. So sometimes CPU can unpack (some frame are ending far before VBL ending) and sometimes it can't (some frame just fit in VBL). To cure that, buffer must have advance : exactly 10 frames ahead (for the 280 record)

Anyway, I am struck by the speed you modify your code ! (a week to burn out a record) When I have an idea, I spend so many hours on coding it (I'm trying for the moment a new promising idea on clearing... but not as revolutionary as yours)

But 30 sprites more can not only be found in clearing optimisation, I think the main optimisation will be in the databuilder search of OR display.
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

In fact, it takes no real needed CPU time : my whole code is in VBL vector $70.w and my "unpacked" buffer is created during the remaining time
Oh yes I'm aware of your "multi-thread" mechanism ( I like spent hours under SainT debugger :-)). But, that is:it *takes* cpu time. I mean, if a frame is faster than 1 vbl, I agree you compute things for "free". But if you don't compute it, you can use that "free time" for other data process. So it takes time ( but it's true you get a good idea by making that process in another thread routine).
But 30 sprites more can not only be found in clearing optimisation, I think the main optimisation will be in the databuilder search of OR display.
You're right. My CLS technic is *only* usefull for memory saving. By chance, it does not take more CPU time than my old routine, but the main goal is memory saving.
My databuilder use several "intelligent" and "brute force" search. "Intelligent" search is used for "MOVE" finding, and brute force is used for "OR" searching and "multi-sprite" routine search.
the whole code is written in C++ with Visual c++ 6.0 and computes data in 30 seconds in my Intel P4 - 2Ghz. the code is not written in an "optimized" way but the Microsoft C++ compiler is very efficient.

Anyway, I am struck by the speed you modify your code
Well I'm a tricker. I spend more time on finding ideas than transform it into 68k code. For the code, I use VisualC++ with an 68k assembler written by Ziggy Stardust. The assembling is very fast, and "errors" appairs in the visual c++ IDE. Then I have my demo system and kernel, I can create a bootable disk image with a simple batch file, and launch it under SainT ( very cool to debug ).
Working with these tools are *far* better (and faster) than in time I write O-Demo or Flipo demo on my 1040-Floppy disk only STE :-)
Leonard/OXYGENE.
Pink/RG
Moderator
Moderator
Posts: 32
Joined: Tue Dec 17, 2002 5:54 pm
Location: Guildford, UK
Contact:

Post by Pink/RG »

This Ziggy Stardust 68k assembler sounds like a useful tool - is it available anywhere?
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

Well, I guess it should be freeware, maybe you can find it on the web. anyway I attached my version. It was modifyed by Ben/OVR ( sc68 author).

I just use "as68 sourcecode.s" and it produces "sourcecode.bin". I assemble at fixed adress in my demo kernel ( using "org" directive.

It's not "devpac" compatible, so you have to change source code a bit. For exemple "equates" are written like:

NBSPRITE = 312

"repeat" and "if" directives are used like:

repeat NBSPRITE
{
nop
}

if (NBSPRITE=313)
{
; woow !
}

Get it here :

http://leonard.oxg.free.fr/download/as68.zip
Leonard/OXYGENE.
ParaPete
Retro freak
Retro freak
Posts: 10
Joined: Wed Jan 12, 2005 3:42 pm
Location: Milton Keynes, UK

Post by ParaPete »

leonard wrote: Well I'm a tricker. I spend more time on finding ideas than transform it into 68k code. For the code, I use VisualC++ with an 68k assembler written by Ziggy Stardust. The assembling is very fast, and "errors" appairs in the visual c++ IDE. Then I have my demo system and kernel, I can create a bootable disk image with a simple batch file, and launch it under SainT ( very cool to debug ).
You mean to say you've got a set up on your PC where you can write your code in MSVC++, hit execute, and have saint run the assembled code all automatically? That'd be awesome! If this is the case then I'd be very interested in seeing how to set this up.
User avatar
leglod
Captain Atari
Captain Atari
Posts: 437
Joined: Sun Nov 28, 2004 9:34 am
Location: Montpellier france
Contact:

Post by leglod »

Bonsoir, je me suis poser plusieur fois la question a savoir si le record pouvait etre facilement battue en utilisant le Blitter des STE ???
User avatar
NiceGuyUK
Atari Super Hero
Atari Super Hero
Posts: 587
Joined: Thu Nov 25, 2004 1:03 pm
Location: Kent, England
Contact:

Post by NiceGuyUK »

leglod wrote:Bonsoir, je me suis poser plusieur fois la question a savoir si le record pouvait etre facilement battue en utilisant le Blitter des STE ???
(very) Rough translation :- Evening. I asked several times if this record could be easily beaten using the Blitter of the STE
Also known as Bossman of RiFT/BitBendaz - there's no "rename" option on the forums for my *very* old username
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

I asked several times if this record could be easily beaten using the Blitter of the STE
Contest rule number 2 specify it should run on 520 STF machine.

Anyway the question is interesting. I think the answer is "yes", but probably not very many sprites. Blitter could be used to clear some columns (the higgest ones to be efficient). I don't think it's efficient to draw 16*16 non masked sprites . ( even those displayer with OR). Our specific CPU routines are already faster than blitting the complete 32*16 area.

Someone wanted to open a new contest on 520 STE ? :-)
Leonard/OXYGENE.
punkrulesok
Captain Atari
Captain Atari
Posts: 237
Joined: Tue Aug 05, 2003 7:34 pm

Post by punkrulesok »

The guy who seems to know most about Blitter coding is the guy who wrote Roger :) I think he could say for sure if a new contest is worth it :)
User avatar
Zorro 2
Administrator
Administrator
Posts: 2281
Joined: Tue May 21, 2002 12:44 pm
Location: Saint Cloud (France)
Contact:

Post by Zorro 2 »

punkrulesok wrote:The guy who seems to know most about Blitter coding is the guy who wrote Roger :) I think he could say for sure if a new contest is worth it :)
Roger has been creating in GFA BASIC, maybe the new rules can make effect in da GFA :?:
Last edited by Zorro 2 on Mon Mar 21, 2005 11:21 am, edited 1 time in total.
.~^ Member of NoExtra Team :: Member of HMD :: Pouet :: DemoZoo :: T-shirt Sponsor ^~.
p01
Captain Atari
Captain Atari
Posts: 158
Joined: Mon Nov 22, 2004 1:27 am
Location: Oslo, Norway
Contact:

Post by p01 »

a sprite record in GFA :wink:
AFAIR, Chuck + Evil Metal/Dune made 42 sprites in 1 VBL. At the time I was stuck at 40.

a controlable lissajou curve in GFA ? I made one with 450 or 500 ( don't remember exactly :P but the previous "record" was 300 ) in No Flowers ( press Tab, F1 - F8 during the effect with the redish background )
Image
User avatar
leglod
Captain Atari
Captain Atari
Posts: 437
Joined: Sun Nov 28, 2004 9:34 am
Location: Montpellier france
Contact:

Post by leglod »

Yes a new Category for 520 STf and 520 STE ... :mrgreen:
MetalGuru
Retro freak
Retro freak
Posts: 15
Joined: Thu Sep 02, 2004 9:17 am
Location: Sweden

Post by MetalGuru »

Wow, good work Leonard, it´s really a great pice of code.... 8O


/MetalGuru
User avatar
leglod
Captain Atari
Captain Atari
Posts: 437
Joined: Sun Nov 28, 2004 9:34 am
Location: Montpellier france
Contact:

Post by leglod »

I propose to you to change Sprites for version STE, with the choice Atari Bombe, Atari Logo or butterfly... :roll:
User avatar
ggn
Atari God
Atari God
Posts: 1260
Joined: Sat Dec 28, 2002 4:49 pm

Post by ggn »

leglod,

If he does change the sprite, then the same code will produce less (or more) sprites!
is 73 Falcon patched atari games enough ? ^^
pht
Atari freak
Atari freak
Posts: 55
Joined: Mon Aug 30, 2004 10:30 am
Contact:

Post by pht »

After I was struck by leonard (312 balls record) 8O...
I stopped any activity on this fornicating stupid record !
But the madness of the code recalled me.

Here is a point of my code advance :
Clearing code was nicely optimized
ImageImage
5 gained rasters lines on my previous (non published) 283 record !
My 36bytes unit clearing datas storing have even been reduced (I did not calculate but average rate should be 28 bytes ?!)

But that is not enough !

I've tested the OR and MOVE display code(leonard one and mine). They are quite similar and CPU time used is nearly the same (the difference time is unimportant)

Now, I have to work on my databuilder. It is very bad in finding OR incomplete sprite.
Leonard used 3 types of OR sprite (1 complete and 2 incomplete = 4 top lines sprite & 4 bottom lines sprite suppressed)
I used 5 differents types of OR sprite for my 283 record.

BTW, Leonard, I have noticed a strange thing in your OR display code.
The 7th shift of the complete sprite :
MOVEQ #$F0,D1
MOVEQ #$C0,D6
MOVEM.L L6A08A(PC),D0/D3-D4
ADDA.W (A4)+,A6
JMP (A6)
...

Your sprite is bugged : to be complete, 2 instructions are missing :
or.l d0,5*160(a6)
or.l d0,10*160(a6)

Did you do that on purpose ?
Strangely, I did not find any visible incidence on curve display... but it should ?!
What's up with this?
"Then I fastly code a brute force research for all different sprite routines"
Number of permutations should be 312 ! = 2,1.10^644 (or would it be 2^312 = 8,34.10^93 ... mathematic gaps)
Anyway brute force research is just impossible... especially if databuilder is running for less than a few minutes...
You should have reduce in a drastic way the number of sprite (intelligent or empirical way)
I am running out of idea for the moment...

That's all.
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

Your sprite is bugged : to be complete, 2 instructions are missing :
or.l d0,5*160(a6)
or.l d0,10*160(a6)

Did you do that on purpose ?
Strangely, I did not find any visible incidence on curve display... but it should ?!
Maybe you don't notice visible incidence just because there is no bugs :-) I guess you don't look carefully enough.
If you run the demo , in 512Kb mode in emulator, just go at adress $69298 ( first sprite rout for complete Or shift 7), you can read:

add.w (a5)+,a4
or.l d0,$640(a5) ; 10*160
or.l d0,$320(a5) ; 5*160

these instructions simply exist in the code :-)


Number of permutations should be 312 ! = 2,1.10^644 (or would it be 2^312 = 8,34.10^93 ... mathematic gaps)
Anyway brute force research is just impossible
Totally agree. 312! is totally out of order of any computer (or group computer on that planet).

The fact is I use "brute force" word too early. To me, as soon as I reach n^2 complexity, it's brute force :-) ( optimizing rules). n! is totally un-reachable and I don't think about these routs in general. I just try to avoid n^2 routines.
That's why in my case, my OR research ( but my MOVE too) works in n^2, wich is only 312*312 = 97344, wich is quite fast on today PC.

Maybe to give an idea: you first though of 312! because you guessed the sprite are masked. Just remember the sprite are displayed with OR, it may help you finding the 312*312 only complexity.

Btw it's really cool to note that this stupid record is quite interesting, just because many tricks reside in the databuilder. Remember old demos ? As soon as a guy release a 3d rout, other grab the code and get the same. With the sprite record, optimisations are now located inside a *non released" code, wich is quite funny ( philosophicaly speaking :-))
Leonard/OXYGENE.
pht
Atari freak
Atari freak
Posts: 55
Joined: Mon Aug 30, 2004 10:30 am
Contact:

Post by pht »

Maybe you don't notice visible incidence just because there is no bugs I guess you don't look carefully enough.
oups yep, you are right ! ... these bloody sprite datas insertion in the code just get me wrong !

I read this

L6A082 MOVEM.L L6A08A(PC),D0/D3-D4
L6A088 ADDA.W (A4)+,A6
L6A08A JMP (A6)
L6A08C BSET D0,
L6A08E ORI #$18F,SR
L6A092 ORI #$FF,SR
L6A096 ORI.B #$DC,$FFFF81AD.W
L6A09C ADDI.W #$81AD,D0
L6A0A0 BTST D1,-(A0)
L6A0A2 OR.L D3,$5A0(A5)
L6A0A6 OR.L D3,$500(A5)


instead of this

L6A082 MOVEM.L L6A08A(PC),D0/D3-D4
L6A088 ADDA.W (A4)+,A6
L6A08A JMP (A6)
L6A08C SPRITE_DATAS
...
L6A098 ADDA.W (A4)+,A5
L6A09A OR.L D0,$640(A5)
L6A09E OR.L D0,$320(A5)
L6A0A2 OR.L D3,$5A0(A5)
L6A0A6 OR.L D3,$500(A5)


And of course I did not see any visible incidence... It was really curious.
Sorry !
Maybe to give an idea: you first though of 312! because you guessed the sprite are masked. Just remember the sprite are displayed with OR, it may help you finding the 312*312 only complexity.
Ok, you really reduce (in an intelligent way) the number of permutations.
Thanks I will think about this tips !
just because many tricks reside in the databuilder
For sure, databuilder is even the only goal for the moment... since my 68000 code did not change for a long time :)

Well... Visual Basic should be my near future :)
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

For sure, databuilder is even the only goal for the moment... since my 68000 code did not change for a long time
I guess now there is only two ways of improving records:


a) Find new ideas to improce data generation

b) Have a totally new idea about sprite displaying / clearing ( and so heavily change the ST runtime code). That's the kind of idea I would like to have :-) ( basically the generated sprite code technic have not changed since TCB)

c) Have an idea to get memory so you can use more sprite routs ( I only use 3)


Actually I don't see any improvment in my databuilder. ( but I don't see any improvment when I was at 268 so I'm not sure about it :-))
Leonard/OXYGENE.
User avatar
leonard
Moderator
Moderator
Posts: 678
Joined: Thu May 23, 2002 10:48 pm
Contact:

Post by leonard »

well I don't know how to count to 3..
Leonard/OXYGENE.
GwL
Atarian
Atarian
Posts: 4
Joined: Sat Oct 23, 2004 3:22 pm
Location: France

Post by GwL »

leonard wrote:well I don't know how to count to 3..
Some times i have a look to this forum to read the last news concerning the sprite record.
I keep in mind some ideas to reach this number 3, but Unfortunately I have no time to code these ... maybe one day.

Congratulations to pht and you for this quest . see you (soon i hope) in this forum

Gwl

(PS : Just a question, i know you already answer this one but i forgot it. How many cycles in one vbl ? )
Post Reply

Return to “Coding”