Blitter Execution Times

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Blitter Execution Times

Post by uko »

Hi !

Up to now I have mainly used the following table to estimate the blitter execution times for sizing my code (and also the Blitter FAQ from The Paranoid of Paradox):
http://retrospec.sgn.net/users/tomcat/m ... tm#BLITTER

These timings seem OK as loong as SKEW, NFSR are set to 0.
But, when using different values for these registers, but also by simply changing masks values to something different than $FFFF, these timings are no longer accurate.
This is logigical in fact, and after several tests I have now a better understanding of the bus accesses required by the blitter. Nevertheless since I am working only with emulators (Hatari and Steem SSE), I am wondering if there is a more exhaustive timings table that could confirm my assumptions.

Do you have any other links that could help me ?

Thanks
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

Hi
if you want you can have a look at Hatari's source code, I think timings are really accurate, as it was verified with several demos running in overscan while using blitter at the same time (see the great "We were" by Oxygene for example).
I also compared several own test programs on real STE and Hatari to check special case when running a long operation (such as MUL or DIV) at the same time the blitter is starting.

Look in blitter.c to all the place where Blitter_AddCycles() is called.
But basically, you need to count 4 cycles for each physcical read from RAM and 4 cycles for each physical write.

By looking at the logical operation you want to perform with the blitter, you can have up to 2 reads and 1 write. Skewing or masking don't add any additional cycles.
(there're a few corner cases with specific value of xcount=1 that are not correctly handled in Hatari yet, but cycles should be OK for 99% of the remaining cases)

Nicolas
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

Nicolas,

Thanks for pointing the Hatari source code, I'll have a look to it.

I am quite surprised that masking does not add additional cycles, and it seems that in my attempts (but I must check again of course), Hatari uses more cycles when mask is not $FFFF. But I am in the case where XCount = 1 (in fact XCount =1 when SKEW = 0, and XCount = 2 and NFSR=1 when SKEW <>0) so that could explain this.
I tell you I am surprised because for me there should not be any difference for the blitter between using:
- the mode 7 (source OR destination)
- the mode 3 (source) with mask different of $FFFF

In both cases this corresponds to doing a OR, and the first case requires 12 cycles, whereas the second would only require 8.

But of course I'm going to check all this again before going further.

Thanks for your help.
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

Masking itself doesn't add cycles, but the blitter doc states that in some case depending on the mask value a RMW operation will be made, which add cycles because of the required additional memory access (see this comment in the source " When NFSR or mask is not all '1', a read-modify-write is always performed")

EDIT : note that if the part you want to compare is when XCOUNT=1 and NFSR and/or XFSR are set then you might hit the case that is not supported yet in Hatari and where cycles' count is certainly not correct.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

uko wrote:I tell you I am surprised because for me there should not be any difference for the blitter between using:
- the mode 7 (source OR destination)
- the mode 3 (source) with mask different of $FFFF

In both cases this corresponds to doing a OR, and the first case requires 12 cycles, whereas the second would only require 8.
Both cases do have the same timing. As Nicolas is saying, a mask will force a destination read disregarding the mode. But a mask will not alter the timing if the mode already requires a destination read cycle.
npomarede wrote:(see this comment in the source " When NFSR or mask is not all '1', a read-modify-write is always performed")
NFSR doesn't force a destination read cycle. Why it would? NFSR affects the read of the source, not the destination.

It is true that there is no much point in using NFSR without a mask. But (if for some reason) NFSR is set and the active mask is $FFFF and the OP mode doesn't require the original dest data, then Blitter doesn't perform a destination read.
Fx Cast: Atari St cycle accurate fpga core
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1964
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Blitter Execution Times

Post by Cyprian »

npomarede wrote:Skewing or masking don't add any additional cycles.
If I'm not wrong:
- when mask is different than $FFFF then you have to add one bus cycle for destination read;
- skewing has an impact onto cycles when XFSR / NFSR is set.
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

Cyprian wrote:
npomarede wrote:Skewing or masking don't add any additional cycles.
If I'm not wrong:
- when mask is different than $FFFF then you have to add one bus cycle for destination read;
- skewing has an impact onto cycles when XFSR / NFSR is set.
Hi
not to be too much punctilious, but I would say these 2 conditions have indirect impact on the total number of cycles, in the sense that skewing (shifting) bits per se doesn't take any cycle, same for masking (when comparing with the 68000 for example where LSL or AND are really taking cycles).
What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.
Nicolas
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

ijor wrote:It is true that there is no much point in using NFSR without a mask. But (if for some reason) NFSR is set and the active mask is $FFFF and the OP mode doesn't require the original dest data, then Blitter doesn't perform a destination read.
It may be useful if you use bitplanes as layers, with 16 pixels wide graphics.
npomarede wrote:What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.
Yes, you are right, I have taken shortcuts in explaining, by mixing cycles and bus accesses.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

npomarede wrote:What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.
Right. For completeness I should add SMUDGE setting as well.
uko wrote:
ijor wrote:It is true that there is no much point in using NFSR without a mask ...
It may be useful if you use bitplanes as layers, with 16 pixels wide graphics.
How it would be useful to enable NFSR without a mask ???
Fx Cast: Atari St cycle accurate fpga core
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

ijor wrote:How it would be useful to enable NFSR without a mask ???
In very limited cases... But imagine you have a background using only 3 bitplanes, and a one bitplane sprite of size 16x200 to move over (and of course accurate palette settings). In this case, when you set both SKEW and NFSR, you don't mind writing also the non-masked bits since they won't disturb the background.
And if your sprite is larger than 16 pixels, and that you display it in slices of 16 pixels (because it is easier to integrate in a fullscreen), you could use this for the first left slice.
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

Hi!

In order to make things clearer (at least for me !), I have written a little test program to perform some tests and count the number of memory accesses performed by the blitter.
The source code is included in the zip file attached to this post.

This program blits a one bitplane rectangle onto the screen and the blue background color allows to measure the execution time. The grid has a point every 5 lines.
At the beginning of the source, there are parameters to allow testing easily various configurations:
- X position (if different from 0, then SKEW and NFSR are set)
- Rectangle width (X Count)
- Force mask values (by default they are computed from SKEW value)

I have run several tests with width = 2 words, but also with width = 1 word (even if Nicolas told that there could be some problems with Hatari in this case).
The results are presented in the Excel sheet (one tab per width value) also attached in the zip file. For each test configuration, I have put the theoretical number of accesses per line (at least my understanding !), and results obtained with the latest Hatari and Steem SSE versions. Sorry I have no real HW to test...
I hope I have not made any bugs, or errors in reporting results...

In most cases, my vision of the theory is compliant with both emulators.

But for cases where SKEW/NFSR are set and the right mask is forced to $FFFF, I have found some discrepancies. Hatari and Steem do no perform the same number of memory accesses !
I am more in favour of Steem results, but if someone can have a look, that could be nice !
BLIT_Test.zip
You do not have the required permissions to view the files attached to this post.
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

Hi
could you provide the executable in your zip archive ? My STE is currently not available (still packed after recently moving home), but maybe some other people here could run your tests on their STE.
Nicolas
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

npomarede wrote:Hi
could you provide the executable in your zip archive ? My STE is currently not available (still packed after recently moving home), but maybe some other people here could run your tests on their STE.
Nicolas
No problem. The source needs to be recompiled to change the test parameters, but I have compiled a version that tests the strange case.
It is attached to this post.
It requires around 55 scanlines on Hatari and 48 on Steem SSE.
BlitTestTOS.zip
You do not have the required permissions to view the files attached to this post.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

uko wrote: But for cases where SKEW/NFSR are set and the right mask is forced to $FFFF, I have found some discrepancies. Hatari and Steem do no perform the same number of memory accesses !
As I said already, NFSR doesn't force a destination read, which seems what Hatari is doing wrong. That seems to be the difference.
... but also with width = 1 word (even if Nicolas told that there could be some problems with Hatari in this case).
Everybody is wrong in this case. If XCOUNT is set to one, Blitter doesn't fully honor the NFSR bit and performs a final source read regardless. It might be considered a Blitter buglet, but you shouldn't need to use NFSR if XCOUNT is one.
Fx Cast: Atari St cycle accurate fpga core
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

ijor wrote:Everybody is wrong in this case. If XCOUNT is set to one, Blitter doesn't fully honor the NFSR bit and performs a final source read regardless. It might be considered a Blitter buglet, but you shouldn't need to use NFSR if XCOUNT is one.
In the post describing the tests I have performed, when I talk about setting the value of Xcount to 1, it is for SKEW=0. As soon as the SKEW is set to a value different than 0, the program automatically increments Xcount (and sets NFSR). So I have not tested the case you mention (Xcount=1 and NFSR set).
Sorry for the misunderstanding.
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

uko wrote: No problem. The source needs to be recompiled to change the test parameters, but I have compiled a version that tests the strange case.
It is attached to this post.
It requires around 55 scanlines on Hatari and 48 on Steem SSE.

BlitTestTOS.zip
Hi
I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.

Apart from the wrong timings, does the visual result / pixels look correct on screen anyway (similar to Steem in that case) ? (I guess it's correct if the problem is just an unnecessary read)

Nicolas
czietz
Hardware Guru
Hardware Guru
Posts: 1296
Joined: Tue May 24, 2016 6:47 pm

Re: Blitter Execution Times

Post by czietz »

Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

npomarede wrote: I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.
Hi !
Thanks for your support ! (and for all the work on Hatari !)
Feel free to PM me if you have any questions concerning the test code & results I have posted in this thread.
Apart from the wrong timings, does the visual result / pixels look correct on screen anyway (similar to Steem in that case) ? (I guess it's correct if the problem is just an unnecessary read)
The visual result is perfectly OK both in the test program, and in the concrete code I'm working on.
Up to now it is only a problem of unnecessary read. I'm focusing on it because it will be integrated (I hope so) in a fullscreen routine and I'm counting every cycle ; moreover since I'll use a 16 pixels wide splitting of my blitting, the sum of the extra reads finally costs some time :wink:
User avatar
npomarede
Atari God
Atari God
Posts: 1348
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Post by npomarede »

uko wrote:
npomarede wrote: I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.
Hi !
Thanks for your support ! (and for all the work on Hatari !)
Feel free to PM me if you have any questions concerning the test code & results I have posted in this thread
Hi
I had a look at the current blitter.c code and unfortunately the same "structural" changes in the code are required to handle this nfsr problem as to handle the xcount=1 case (historically, the first blitter code in Hatari was lacking accuracy, then in 2008 Tobé contributed a very-very good rewrite that matched closely the blitter's documentation and hugely improved emulation. This code was improved here and there to handle some non working cases (improved bus cycles counting in non hog mode for example), but some parts now need to be rewritten in a different way to better match the "real" state machine of the blitter, these parts can't be fixed without structural changes in the code for blitter.c)

Hopefully, I will be able to fix this in a near future :)

Nicolas
uko
Atari User
Atari User
Posts: 36
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Post by uko »

npomarede wrote:Hopefully, I will be able to fix this in a near future :)
Hopefully I'm a slow coder and it will take me a long time to finish my screen ! :lol:
Thanks.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

uko wrote:In the post describing the tests I have performed, when I talk about setting the value of Xcount to 1, it is for SKEW=0. As soon as the SKEW is set to a value different than 0, the program automatically increments Xcount (and sets NFSR). So I have not tested the case you mention (Xcount=1 and NFSR set). Sorry for the misunderstanding.
I see. I missed that. No problem :)
Fx Cast: Atari St cycle accurate fpga core
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

czietz wrote:Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...
Non-existing memory region? Like accessing the 2nd MB on a machine with 1 MB RAM only? The timing in this case should be identical as accessing actual existing RAM. You are saying Hatari performs a different timing?

Or you mean Blitter accessing non RAM space, like ROM or I/O locations? The timing, wait states and alignment are identical and follows the same rules as the CPU accessing the same location. The only exception is accessing the ACIAs.

I know that some program Blit to the Shifter's palette. But here the timing is identical as Blitting to RAM.
Fx Cast: Atari St cycle accurate fpga core
czietz
Hardware Guru
Hardware Guru
Posts: 1296
Joined: Tue May 24, 2016 6:47 pm

Re: Blitter Execution Times

Post by czietz »

ijor wrote:
czietz wrote:Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...
Non-existing memory region? Like accessing the 2nd MB on a machine with 1 MB RAM only? The timing in this case should be identical as accessing actual existing RAM. You are saying Hatari performs a different timing?
I mean accessing e.g. addresses between 0x400000 and 0xDFFFFF, or in general any memory region for which no DTACK is generated but eventually a bus error is generated by Glue.
ijor
Hardware Guru
Hardware Guru
Posts: 4013
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Post by ijor »

czietz wrote:I mean accessing e.g. addresses between 0x400000 and 0xDFFFFF, or in general any memory region for which no DTACK is generated but eventually a bus error is generated by Glue.
Ah, yes. And this includes not only addresses that provoke a Bus Error when accessed by the CPU. Same timing would happen if Blitter attempts to access the ACIA space because no DTACK is generated in that case, only VPA, but VPA is not connected to Blitter.
Fx Cast: Atari St cycle accurate fpga core
fenarinarsa
Atari freak
Atari freak
Posts: 55
Joined: Sat Mar 15, 2014 11:23 pm

Re: Blitter Execution Times

Post by fenarinarsa »

Don't forget that on Mega STE the blitter takes 4 extra cycles to start ;)

I think Leonard had to take this into account for "We Were@" to work correctly on Mega STE.
Post Reply

Return to “Coding”