STE Blitter

All about ST/STE demos

Moderators: Mug UK, lotek_style, Moderator Team

Post Reply
User avatar
keops
Atari Super Hero
Atari Super Hero
Posts: 609
Joined: Mon Jul 26, 2004 3:39 pm
Location: Canada
Contact:

STE Blitter

Post by keops »

Hello,

I never made STE specific stuff and never got attracted to it back in the days. However with the lock-down, I have a little bit of spare time to try STE stuff, so here are a few totally newbie questions that will save me some time :D

- When using the Blitter, is the CPU partially / totally unavailable?' If I use the entire machine time during the frame to blit sprites or copy stuff using the Blitter, will I have any 68000 time left to do anything?

- What is the theoretical limit regarding the number of sprites per frame? Same with raw copy of stuff on screen (not masked or anything)

- What's the handiest way to use the Blitter to fill polygons? Can it fill vertically / horizontally?

- Can it do "localized" hardware scroll or does it scroll the entire screen through something like video memory address changes?

- Any other interesting stuff the Blitter is capable of I should know about?

And finally, is there any reference / available source code showcasing efficient use of the Blitter I could have a look at?

Thanks a lot in advance
CiH
Atari God
Atari God
Posts: 1163
Joined: Wed Feb 11, 2004 4:34 pm
Location: Middle Earth (Npton) UK
Contact:

Re: STE Blitter

Post by CiH »

Hi,

This might go a long way to helping.

http://www.atari-wiki.com/index.php/Ata ... _/_Paradox
"Where teh feck is teh Hash key on this Mac?!"
fenarinarsa
Atari freak
Atari freak
Posts: 55
Joined: Sat Mar 15, 2014 11:23 pm

Re: STE Blitter

Post by fenarinarsa »

keops wrote: Tue Jun 16, 2020 3:43 pm- When using the Blitter, is the CPU partially / totally unavailable?' If I use the entire machine time during the frame to blit sprites or copy stuff using the Blitter, will I have any 68000 time left to do anything?
The general rule is that when the blitter runs, it takes control of the bus over the 68000.
If you run the Blitter in HOG mode, the CPU will be completely paused until it can access the RAM again. The only way to get around this is to run the Blitter, followed by a short opcode that runs a long instruction that doesn’t need RAM access. The opcode will then execute because the 68000 has it in its prefetch queue. Typically a mul or div instruction fits nicely there.
I guess that the Mega STE can also takes advantage of its cache if no instruction tries to write to memory.
That's why there's a Blit mode in which the Blitter releases the bus after 64 cycles and leaves 64 cycles to the CPU. The CPU in turn can restart the blitter immediatly. Another way is to run small blits in hog mode like Leonard did in WeWere@.

Paranoid's FAQ has a small error by the way, if a disk transfer is in progress, the DMA has priority over the blitter. But anything related to the CPU (interrupts etc) doesn't.
- What is the theoretical limit regarding the number of sprites per frame? Same with raw copy of stuff on screen (not masked or anything)
I guess it can be derived from Blitter's execution timings: http://retrospec.sgn.net/users/tomcat/m ... tm#BLITTER

For instance in raw copy, you can transfer 32000 bytes in 32000 NOPs. [16000 words, LOP=3/HOP=2: 2 NOPs per word]
That's the beauty of the Blitter, loops are free.
- What's the handiest way to use the Blitter to fill polygons? Can it fill vertically / horizontally?
I guess horizontally, using endmasks and fill mode (maybe +halftone)? You'll need to make as many pass as needed bitplanes. But some parameters of the blitter aren't modified from one blit to another, so for similar blits you can update some registers and restart it quickly.
- Can it do "localized" hardware scroll or does it scroll the entire screen through something like video memory address changes?
All video registers are R/W on the STE, so you can change the video counter mid-screen. I guess you can do it during the display too, with unexpected results.
- Any other interesting stuff the Blitter is capable of I should know about?
I'm no specialist but I read some interesting things about advanced topics, like using the Halftone registers as a LUT...
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1964
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: STE Blitter

Post by Cyprian »

keops wrote: Tue Jun 16, 2020 3:43 pm Hello,

I never made STE specific stuff and never got attracted to it back in the days. However with the lock-down, I have a little bit of spare time to try STE stuff, so here are a few totally newbie questions that will save me some time :D

- When using the Blitter, is the CPU partially / totally unavailable?' If I use the entire machine time during the frame to blit sprites or copy stuff using the Blitter, will I have any 68000 time left to do anything?
you can run one CPU instruction parallel with the BLiTTER. E.g. MUL/DIV
viewtopic.php?p=96197#p96197
keops wrote: Tue Jun 16, 2020 3:43 pm - What is the theoretical limit regarding the number of sprites per frame? Same with raw copy of stuff on screen (not masked or anything)
below some tests our forum colleague @Anima:

Atari STE: 23 32 x 32 Pixel Blitter objects @ 50 Hz using 4 bitplanes + DMA Sample + Rasters:
https://www.youtube.com/watch?v=UotlGkgesbU

Final Fight - Atari STe sprite test by Anima:
https://www.youtube.com/watch?v=9ZIs-NQHZKc

keops wrote: Tue Jun 16, 2020 3:43 pm - What's the handiest way to use the Blitter to fill polygons? Can it fill vertically / horizontally?
fill line by line, four bitplanes at once: https://www.pouet.net/prod.php?which=68506
keops wrote: Tue Jun 16, 2020 3:43 pm - Can it do "localized" hardware scroll or does it scroll the entire screen through something like video memory address changes?
scroll is done on BLiTTER's Line level

keops wrote: Tue Jun 16, 2020 3:43 pm - Any other interesting stuff the Blitter is capable of I should know about?
my small list (not complete I guess):
- Multicolor gfx: https://github.com/zerkman/mpp
- fast DMA channel for IDE: Pera Putnik disk driver;
- Gouraud shading: http://s390174849.online.de/ray.tscc.de/gouraud.htm
- Audio manipulation & channel mixing: https://www.youtube.com/watch?v=ehSvjL8RLo4 https://www.youtube.com/watch?v=Xc0zv4YFitI
- C2P pass - shift and merge with the BLiTTER
keops wrote: Tue Jun 16, 2020 3:43 pm And finally, is there any reference / available source code showcasing efficient use of the Blitter I could have a look at?
as above
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/
User avatar
keops
Atari Super Hero
Atari Super Hero
Posts: 609
Joined: Mon Jul 26, 2004 3:39 pm
Location: Canada
Contact:

Re: STE Blitter

Post by keops »

Thanks a lot guys!
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1964
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: STE Blitter

Post by Cyprian »

a new prod from Equinox?
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/
User avatar
keops
Atari Super Hero
Atari Super Hero
Posts: 609
Joined: Mon Jul 26, 2004 3:39 pm
Location: Canada
Contact:

Re: STE Blitter

Post by keops »

Just spare time fun with Devpac for the moment ;)
leonard
Moderator
Moderator
Posts: 665
Joined: Thu May 23, 2002 10:48 pm
Contact:

Re: STE Blitter

Post by leonard »

Hi Keops!

Almost everything has been said here. let's try to add details if I could
keops wrote: Tue Jun 16, 2020 3:43 pm - When using the Blitter, is the CPU partially / totally unavailable?' If I use the entire machine time during the frame to blit sprites or copy stuff using the Blitter, will I have any 68000 time left to do anything?
Blitter could run in exclusive mode or shared mode. Shared mode is almost useless in demo. Both Blitter use memory BUS each 64 cycles. It could be usefull if you need interrupts, like rasters, or something. There is a trick to restart the blitter as soon as the CPU get 64 cycles. But as I said this is alsmot useless in a "demo" context, where exclusive mode is the fastest.

Counting cycles is really easy: each memory word access is taking 1 NOP. So if a blitter command like "OR" needs to read source, read dst and write dst, it will takes 3 NOPS per word. Blitter is taking 2 nops to start ( 3 NOPS to start on MegaSTE )
keops wrote: Tue Jun 16, 2020 3:43 pm- What is the theoretical limit regarding the number of sprites per frame? Same with raw copy of stuff on screen (not masked or anything)
it was quite low (lower than optimized generated code) but recent discover makes it way faster than generated code. Blitter has native "mask" support ( amiga hasn't :) ). The idea is to set the mask registers for each line of the sprite.
https://youtu.be/UotlGkgesbU managed to display 23 sprite 32*32, 4 bitplans. That's really fast ( really close to amiga )
keops wrote: Tue Jun 16, 2020 3:43 pm- What's the handiest way to use the Blitter to fill polygons? Can it fill vertically / horizontally?
there is no filling function in blitter ( like amiga has ). So you just use blitter to "vertical fill" using XOR. Of course you need to draw lines using CPU
keops wrote: Tue Jun 16, 2020 3:43 pm- Can it do "localized" hardware scroll or does it scroll the entire screen through something like video memory address changes?
You can change the video pointer anywhere in the frame. Of course if you change it in a middle of a line, you have to be very carefull ( and obvisouly test result on real hardware. Emulators could have some issue changing address in the middle of a line)
keops wrote: Tue Jun 16, 2020 3:43 pm- Any other interesting stuff the Blitter is capable of I should know about?
I don't know, you can clear, you can "or" buffers quite fast. As a rule of thumb, blitter is doing exactly what CPU could do, but without the cost of CPU instruction decoding. ( so it's like if you have a instruction cache and CPU instructions are free). For instance, clearing a one bitplan with CPU is 3 nops per word ( move.w dn,xx(an) ) : 2 nops to fetch the instruction, and 1 nop to write. Blitter will be only 1 nop ( no more instruction decoding)
keops wrote: Tue Jun 16, 2020 3:43 pmAnd finally, is there any reference / available source code showcasing efficient use of the Blitter I could have a look at?
when writing We Were @ I did some tests from scratch. It's quite easy, there is a small amount of registers to setup. The only thing you have to remind is when you compute the "modulo" per line, keep in mind the last word didn't increment the X counter. Imagine you want to clear a 64 pixels width block: it means 4 words width. You set the X increment to 8 bytes. If the screen line is 160 bytes, at the end of a line, blitter internal pointer has incremented of 3*8 bytes ( and *not* 4*8 bytes). So you have to set modulo to 160-3*4
Leonard/OXYGENE.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: STE Blitter

Post by ijor »

Hi Leonard

Good to see! How are you doing?
leonard wrote: Wed Jun 17, 2020 7:37 pm I don't know, you can clear, you can "or" buffers quite fast. As a rule of thumb, blitter is doing exactly what CPU could do, but without the cost of CPU instruction decoding. ( so it's like if you have a instruction cache and CPU instructions are free). For instance, clearing a one bitplan with CPU is 3 nops per word ( move.w dn,xx(an) ) : 2 nops to fetch the instruction, and 1 nop to write. Blitter will be only 1 nop ( no more instruction decoding)
I think the best part in Blitter is that it can shift for free. That would be way much faster than doing that with the CPU.

I know that is not much used in demos, but that seems the main goal of Blitter, to be able to perform a generic BitBlt copy. On retrospective it was probably a bit of a waste when it turned out that optimizing the VDI and LINE-A software was more significant.
Fx Cast: Atari St cycle accurate fpga core
leonard
Moderator
Moderator
Posts: 665
Joined: Thu May 23, 2002 10:48 pm
Contact:

Re: STE Blitter

Post by leonard »

Hi Ijor!

Good to see you too! I'm doing fine. What about your FPGA atari core?

And yes you're right, blitter is also more powerfull than CPU to perform shift! blitter shifts are just "free".
Leonard/OXYGENE.
fenarinarsa
Atari freak
Atari freak
Posts: 55
Joined: Sat Mar 15, 2014 11:23 pm

Re: STE Blitter

Post by fenarinarsa »

ijor wrote: Wed Jun 17, 2020 8:07 pm I think the best part in Blitter is that it can shift for free. That would be way much faster than doing that with the CPU.

I know that is not much used in demos, but that seems the main goal of Blitter, to be able to perform a generic BitBlt copy. On retrospective it was probably a bit of a waste when it turned out that optimizing the VDI and LINE-A software was more significant.
I don't really agree, would the blitter be available in STFs in 1987 - when it was released for the Mega ST and when placeholders for the blitter actually started to pop on STF's motherboards with TOS 1.02 -, it would have been very interesting for games. They could have used it to shift elements when there was not enough memory for preshifting, and fallback to a software BitBlt for older STs.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: STE Blitter

Post by ijor »

leonard wrote: Wed Jun 17, 2020 9:48 pm Good to see you too! I'm doing fine. What about your FPGA atari core?
You can read the Blitter source here if you are interested: https://github.com/ijor/stBlitter
Fx Cast: Atari St cycle accurate fpga core
User avatar
Frank B
Atari God
Atari God
Posts: 1029
Joined: Wed Jan 04, 2006 1:28 am
Location: Boston

Re: STE Blitter

Post by Frank B »

Even with regular masking the blitter is faster than pre shifting with the CPU. It uses 20 times less memory when doing it on a 32*32 sprite.
The CPU cannot beat the bliter executing the same algorithm via and/or passes. The CPU cannot beat the blitter when individual planes need to be accessed. Shifts are free. X/Y increments are free. Overhead for a general purpose bitblit is very high. That's why the OS is slow. You don't need that complexity if you can constrain the inputs or can draw multiple objects of the same size. The overhead is comparable to the Amiga set up wise.
They've been using it for years to great effect. There's no excuse not to use it :)
Zamuel_a
Atari God
Atari God
Posts: 1242
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: STE Blitter

Post by Zamuel_a »

- What's the handiest way to use the Blitter to fill polygons? Can it fill vertically / horizontally?
I made some tests a few years ago and have an example here:
viewtopic.php?f=68&t=27153&p=261513&hil ... on#p261513
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe
User avatar
keops
Atari Super Hero
Atari Super Hero
Posts: 609
Joined: Mon Jul 26, 2004 3:39 pm
Location: Canada
Contact:

Re: STE Blitter

Post by keops »

Thanks guys, once I'm done with my current little project I will give it a try
leonard
Moderator
Moderator
Posts: 665
Joined: Thu May 23, 2002 10:48 pm
Contact:

Re: STE Blitter

Post by leonard »

ijor wrote: Thu Jun 18, 2020 3:09 amYou can read the Blitter source here if you are interested: https://github.com/ijor/stBlitter
Did you really thought I was able to read VHDL? I didn't understand anything to these hardware :( ( but I'd love to )
Leonard/OXYGENE.
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1964
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: STE Blitter

Post by Cyprian »

ijor wrote: Thu Jun 18, 2020 3:09 am You can read the Blitter source here if you are interested: https://github.com/ijor/stBlitter
I don't know VHDL either, but your code is nice piece of code for learning.

does that mean CPU/BLiTTER split 64 / 64?

Code: Select all

	/* Non HOG DMA cycle counter. Counts 64 bus cycles (any bus cycles).	
	BLITTER Buglet: Starts counting as soon as BUSY is set.
the outcome from my tests on the real STe is that split is isn't stiff.
CPU part can takes more bus cycles than 64 and the BLiTTER takes 65 on STE and 66 on Mega STE. And in two last cases, only 63 bus cycles are for data the rest are for bus mastering.
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: STE Blitter

Post by npomarede »

Cyprian wrote: Fri Jun 19, 2020 10:37 am the outcome from my tests on the real STe is that split is isn't stiff.
CPU part can takes more bus cycles than 64 and the BLiTTER takes 65 on STE and 66 on Mega STE. And in two last cases, only 63 bus cycles are for data the rest are for bus mastering.
Hi
if we leave the bus arbitration time aside, then the blitter in non-hog mode will give 64 bus accesses to the cpu and 64 bus accesses to itself.
As I measured some years ago for Hatari, depending on the instruction used to start the blitter (if write is made first then prefetch after) then the blitter will sometimes wrongly count 1 cpu bus access for prefetch as if it was 1 blitter bus access. So in the end the blitter will only do 63 bus accesses itself, processing one word less than expected.
I was glad to see that Ijor's VHDL confirmed my guess that blitter counted wrongly the first bus access sometimes :D
Nicolas
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: STE Blitter

Post by ijor »

Cyprian wrote: Fri Jun 19, 2020 10:37 am does that mean CPU/BLiTTER split 64 / 64?

Code: Select all

	/* Non HOG DMA cycle counter. Counts 64 bus cycles (any bus cycles).	
	BLITTER Buglet: Starts counting as soon as BUSY is set.
the outcome from my tests on the real STe is that split is isn't stiff.
CPU part can takes more bus cycles than 64 and the BLiTTER takes 65 on STE and 66 on Mega STE. And in two last cases, only 63 bus cycles are for data the rest are for bus mastering.
The comment you quoted doesn't really says exactly how many bus cycles Blitter takes. It just means that Blitter tries to count 64 bus cycles used for itself, but because of the bug, sometimes it would include one bus cycle used by the CPU instead. This is the reason for the observed 63 cycles.

Now, exactly how many clock cycles each "turn" takes, and not just how many DMA bus cycles were performed, it's a bit complicated because it depends on several things. At least one bus cycle is always wasted by Blitter when initializing its internal machine state.
npomarede wrote: Fri Jun 19, 2020 3:42 pm As I measured some years ago for Hatari, depending on the instruction used to start the blitter (if write is made first then prefetch after) then the blitter will sometimes wrongly count 1 cpu bus access for prefetch as if it was 1 blitter bus access.
Not exactly. The order of the bus cycles in the instruction doesn't matter for this purpose. It doesn't matter because DMA doesn't depend on instruction border. What matters here is if the instruction has idle clock cycles without accessing the bus.

The order of bus cycles you mentioned might determine if the next instruction starts running before Blitter takes control or not. But this is a different thing. Blitter would still use for himself 63 bus cycles either way.
Fx Cast: Atari St cycle accurate fpga core
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1964
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: STE Blitter

Post by Cyprian »

ijor wrote: Fri Jun 19, 2020 4:28 pmThe comment you quoted doesn't really says exactly how many bus cycles Blitter takes. It just means that Blitter tries to count 64 bus cycles used for itself, but because of the bug, sometimes it would include one bus cycle used by the CPU instead. This is the reason for the observed 63 cycles.

Now, exactly how many clock cycles each "turn" takes, and not just how many DMA bus cycles were performed, it's a bit complicated because it depends on several things.
that's very interesting.
In my tests I wasn't able to get different results that mentioned BLiTTER's 65 bus cycles (63+2) and 66 (63+3) on MSTE. But I did only a few specific tests.
Some times ago I read on EmuTOS and Hatari mailing list about TAS and BLiTTER issue.

Anyway would be cool find such a corner case and reproduce it.
ijor wrote: Fri Jun 19, 2020 4:28 pm At least one bus cycle is always wasted by Blitter when initializing its internal machine state.
true.

btw would be cool to read more about the BLiTTER internals
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/
Post Reply

Return to “Demos - General”