Blitter vs movem for solid blocks of colour in game code

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Post Reply
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Blitter vs movem for solid blocks of colour in game code

Post by chicane »

Some readers might know that I'm working on an STE enhanced version of Lotus 1. I've been working through the graphics routines, replacing 68000-driven routines with Blitter routines where it makes sense to do so.

I've now landed at the following loop, each iteration of which draws a single line (320 pixels) of colour 15 to the framebuffer.

Code: Select all

$00079412 : 48e4 fcf0                          movem.l   d0-d5/a0-a3,-(a4)
$00079416 : 48e4 fcf0                          movem.l   d0-d5/a0-a3,-(a4)
$0007941a : 48e4 fcf0                          movem.l   d0-d5/a0-a3,-(a4)
$0007941e : 48e4 fcf0                          movem.l   d0-d5/a0-a3,-(a4)
$00079422 : 51cf ffee                          dbra      d7,$79412
My understanding is that a movem.l instruction with a predecrement as above needs 8 cycles, plus 8 cycles per register specified. So each of the above movem.l instructions would need 88 cycles, resulting in a total of 352 cycles per line drawn.

I've had a go this morning at optimising this using the Blitter. Thankfully, this is colour 15, so I just need to draw a solid sequence of $ffff word values to the screen buffer - there's no added complexity resulting from needing to write different values for different words.

This is game code, so there are various interrupts going on in the background that need to happen. We therefore have no choice but to use Hog mode and split the blits into small chunks - we can't have the Blitter tying up the bus for long periods by (for example) drawing all lines in one go. Shared mode isn't going to work because the time slices are not sufficiently granular and interrupts will get blocked.

My understanding is that the best-case scenario for the Blitter is 1 nop per word written. So given that we're writing 80 words per line, that's 80 nops, which, at 4 cycles per nop, equates to 320 cycles (versus 352 cycles for the movem). However, we also have to start the Blitter for each line, which as a bare minimum, might look like this:

Code: Select all

move.w d1,(a1) ; ycount (8a38)
move.b d2,(a0) ; blitter control (8a3c)
I believe that each of the above two instructions amounts to 8 cycles, taking our total cycles per line for the Blitter to 336 cycles. However, it turns out in this particular scenario that writing 80 words at a time starts to slow down the music (presumably by blocking interrupts), so in order to mitigate this, I can split the drawing of each line into two stages, running the move.w and move.b twice. This takes us up to 352 cycles, which conveniently matches the original movem implementation. There's now no difference in timing between using movem and Blitter, except that movem isn't messing with interrupts.

So in conclusion, I know that the Blitter will outperform movem in theoretical terms, even for drawing solid blocks of colour. But when taking into account the constraints put in place by game code, is it fair to say that things aren't so clear cut, and that sometimes the Blitter will be ahead and sometimes movem will be ahead? For simplicity, I've omitted the initial setup code for both movem and Blitter from this discussion, but they appear roughly equivalent in terms of size and timing.

I've gained a lot from using the Blitter in Lotus STE, but in this particular scenario it doesn't feel like the right thing to do.

Thoughts and corrections from the audience would be much appreciated!
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 2216
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Blitter vs movem for solid blocks of colour in game code

Post by Cyprian »

yep, in your particular case the BLiTTER isn't faster than movem, but it still has a big advantage: 68k registers usage. In case of movem 12 registers (d0-d5/a0-a3, a4, d7) are trashed, in case the BLiTTER only four: d1,a1,d2,a0.

Additionally, you have possibility to run the CPU parallel with the the BLiTTER .
Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

Thanks for confirming my findings Cyprian.
Cyprian wrote: Fri Mar 19, 2021 1:43 pm yep, in your particular case the BLiTTER isn't faster than movem, but it still has a big advantage: 68k registers usage. In case of movem 12 registers (d0-d5/a0-a3, a4, d7) are trashed, in case the BLiTTER only four: d1,a1,d2,a0.
That's very true. In this case I can't really leverage the benefit without attempting to understand and rewrite the Lotus code that surrounds this block, but definitely applicable in a more general context.
Cyprian wrote: Fri Mar 19, 2021 1:43 pm Additionally, you have possibility to run the CPU parallel with the the BLiTTER .
That's true. Again, I can't very easily do that in this particular context. But you have reminded me of another area of the Lotus code where I've been meaning to run a MUL instruction in parallel with the Blitter - thanks!
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 2216
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Blitter vs movem for solid blocks of colour in game code

Post by Cyprian »

chicane wrote: Fri Mar 19, 2021 3:37 pm But you have reminded me of another area of the Lotus code where I've been meaning to run a MUL instruction in parallel with the Blitter - thanks!
sounds cool
Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
User avatar
Frank B
Atari God
Atari God
Posts: 1042
Joined: Wed Jan 04, 2006 1:28 am
Location: Glasgow

Re: Blitter vs movem for solid blocks of colour in game code

Post by Frank B »

If you are doing a lot of blit overhead then that’s going to cost you. Speed wise there isn’t that much difference on a raw contiguous copy or filling with a constant. The instruction overhead is minimal. If you were to unroll it for speed the blitter may save on code size. If you were doing a non contagious copy the blitter would be much faster. Eg copying 1 out of every four words to hit a single plane. The increment in on source and dest is free. Add a logical op and it gets even faster. Add shifting and masking and it is much much faster. Depends on what you are doing. I’d say the blitter would be faster in most cases and save on code space.
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

Frank B wrote: Fri Mar 19, 2021 9:12 pm If you are doing a lot of blit overhead then that’s going to cost you. Speed wise there isn’t that much difference on a raw contiguous copy or filling with a constant. The instruction overhead is minimal. If you were to unroll it for speed the blitter may save on code size. If you were doing a non contagious copy the blitter would be much faster. Eg copying 1 out of every four words to hit a single plane. The increment in on source and dest is free. Add a logical op and it gets even faster. Add shifting and masking and it is much much faster. Depends on what you are doing. I’d say the blitter would be faster in most cases and save on code space.
Many thanks for your comments Frank. I don't know much about electronics and hardware, but it's a shame the designers of the blitter couldn't figure out a way to stop it from blocking interrupts on large blits (putting aside shared mode which is ultimately a bit useless). If that stumbling block was dealt with, we could use it for just about everything!
User avatar
AdamK
Captain Atari
Captain Atari
Posts: 352
Joined: Wed Aug 21, 2013 8:44 am

Re: Blitter vs movem for solid blocks of colour in game code

Post by AdamK »

The real solution would be to give blitter real DMA. I think there is a lot of spare cycles on the bus to do that. It might get a bit slower, but way more useful.
Atari: FireBee, Falcon030 + CT60e + SuperVidel + SvEthlana, TT, 520ST + 4MB ST RAM + 8MB TT RAM + CosmosEx + SC1435, 1040STFM + UltraSatan + SM124, 1040STE 4MB ST RAM + 8MB TT RAM + CosmosEx + NetUSBee + SM144 + SC1224, 65XE + U1MB + VBXE + SIDE2, Jaguar, Lynx II, 2 x Portfolio (HPC-006)

Adam Klobukowski [adamklobukowski@gmail.com]
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

AdamK wrote: Mon Mar 22, 2021 11:01 am The real solution would be to give blitter real DMA. I think there is a lot of spare cycles on the bus to do that. It might get a bit slower, but way more useful.
I'd say the Blitter is still extremely useful - it's just a shame that we have to use what are effectively programmatic workarounds (hog mode, small blits etc) to get it working nicely in tandem with the rest of the hardware under "real game" conditions.

I suppose it's a reflection of the fact that the Amiga chipset was designed all at once for the various components to work together, whereas the STE inherited the Blitter as a result of it being an earlier and optional add-on to higher end ST's.
elliot
Atari maniac
Atari maniac
Posts: 76
Joined: Tue Mar 17, 2009 2:00 pm

Re: Blitter vs movem for solid blocks of colour in game code

Post by elliot »

I thought the Blitter was always going to be a part in the ST machine (or at least an optional extra) but the OS and Chip would not be ready for a launch (along with a new sound chip).
User avatar
Dbug
Atari freak
Atari freak
Posts: 64
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Blitter vs movem for solid blocks of colour in game code

Post by Dbug »

Just to make sure I understand, when you were doing your firsts Blitter tests to erase (solid color), where did the $FFFF came from?

As far as I know, there are three possibilities:
- Set the HOP combination rule to 2 and use a source pointer on a single $FFFF word in memory and have source increment set to 0
- Set the HOP combination rule to 0 to force each of the destination values to $FFFF
- Set the HOP combination rule to 1 to use the internal halftone memory as fill pattern (based on the line number)

Which setup were you using?
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

Dbug wrote: Fri May 14, 2021 9:11 am Just to make sure I understand, when you were doing your firsts Blitter tests to erase (solid color), where did the $FFFF came from?

As far as I know, there are three possibilities:
- Set the HOP combination rule to 2 and use a source pointer on a single $FFFF word in memory and have source increment set to 0
- Set the HOP combination rule to 0 to force each of the destination values to $FFFF
- Set the HOP combination rule to 1 to use the internal halftone memory as fill pattern (based on the line number)

Which setup were you using?
I was writing a word value $ff to $ffff8a3a, which according to http://alive.atari.org/alive6/ste3.php means:

- HOP = 0 = blind copy
- OP = ff = Target is set to "1"
User avatar
metalages
Captain Atari
Captain Atari
Posts: 172
Joined: Thu Jun 06, 2013 5:14 pm
Location: France
Contact:

Re: Blitter vs movem for solid blocks of colour in game code

Post by metalages »

Have you identified which interrupt is used to run the musik ?
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

metalages wrote: Sun May 16, 2021 4:34 am Have you identified which interrupt is used to run the musik ?
I believe the music routine is driven entirely by the VBL. Junosix may be along at some point to correct me :)
User avatar
metalages
Captain Atari
Captain Atari
Posts: 172
Joined: Thu Jun 06, 2013 5:14 pm
Location: France
Contact:

Re: Blitter vs movem for solid blocks of colour in game code

Post by metalages »

These slowdowns sound like the vbl interrupt does not run when a blit is running and not just delayed ? (Jittering the ym replay by a few scanlines would not be noticeable)
junosix
Captain Atari
Captain Atari
Posts: 362
Joined: Sun Jul 08, 2007 3:22 pm
Location: Plymouth

Re: Blitter vs movem for solid blocks of colour in game code

Post by junosix »

Off the top of my head, I think Timer C deals with the tempo and the vbl writes the values to the PSG, so if Timer C gets delayed that will mess with the speed of the music. If 60Hz TOS is used the music speed is the same but the software envelope sounds different.
User avatar
metalages
Captain Atari
Captain Atari
Posts: 172
Joined: Thu Jun 06, 2013 5:14 pm
Location: France
Contact:

Re: Blitter vs movem for solid blocks of colour in game code

Post by metalages »

Interesting.
Maybe moving everything to vbl interrupt may fix the problem ?
I am not an expert of the MFP (just basic timer B use...) but if I well understood there is something like you can specify if the interrupt is just dropped or should be delayed (and then the signal should be reset by the interupt code). Am I wrong ? If this is the case maybe you can try just to change a bit the way the timer C is used ? Or maybe it happens if timer C is delayed from just before VBL to just after the VBL ?
junosix
Captain Atari
Captain Atari
Posts: 362
Joined: Sun Jul 08, 2007 3:22 pm
Location: Plymouth

Re: Blitter vs movem for solid blocks of colour in game code

Post by junosix »

Just checked - I was wrong about Timer C, it isn't enabled at all, music is actually all part of the VBL (the difference for 50/60Hz is done by checking $ffff820a and adjusting the tempo accordingly). Might be able to be solved by moving the DMA mixer elsewhere in the vbl, but that may have consequences itself by making the sampled sound have pops and clicks here and there if that routine is delayed. Jon's got a better idea of how the VBL sequence works in its entirety than I do, so I can't say what else may be affected by moving things about there.
chicane
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Jul 02, 2012 11:25 am
Location: Leeds, UK

Re: Blitter vs movem for solid blocks of colour in game code

Post by chicane »

Dbug hasn't returned to this thread to provide his thoughts, but my feeling is that the performance benefits of trying to use the Blitter in this scenario are so marginal that it wouldn't be worth the effort trying to reorganise the VBL code in an attempt to replace the existing movem code with Blitter code.

There are a couple of other areas that we'd probably want to visit first - the lap counter running along the left of the screen being one of them. It's currently rendered as a 16x96 pixel bitmap on each frame. I have done some optimisation work, but there's further room for improvement - specifically with respect to not rendering rows where the lap counter doesn't actually appear.
User avatar
Dbug
Atari freak
Atari freak
Posts: 64
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Blitter vs movem for solid blocks of colour in game code

Post by Dbug »

The only big performance optimization I could see would require a massive change of how things are done.
Basically the code is using timers to perform the color changes, and that's a massive performance hit, because the cost of an IRQ itself is huge, with 44 cycles JUST for the interrupt call, plus 20 cycles for the RTE, plus whatever code you need to do inside to preserve the context, if used on the 200 lines of the screen, that's already 8% of the total CPU budget for the frame gone, doing nothing at all.

And then of course, because you need not disturb the IRQ, you have to run the Blitter in bus sharing mode as a consequence.

Regarding specifically the cleaning of the background, I assume the code already only clears what needs to be cleaned from the previous frame to the current one?
User avatar
Frank B
Atari God
Atari God
Posts: 1042
Joined: Wed Jan 04, 2006 1:28 am
Location: Glasgow

Re: Blitter vs movem for solid blocks of colour in game code

Post by Frank B »

This might be a dumb idea of mine :) Anyway. You're already triggering a timer b interrupt all through the frame for the road and sky. Could you track where the visible area of the screen starts and ends too? If you're in the borders don't yield at all on bilts instead. Use hog mode with full y line length and yield when blitting the next plane. Depends if the overhead is too expensive but it would speed up all blits significantly for 1/3rd of the screen frame.
Post Reply

Return to “680x0”