Blitter Execution Times

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Blitter Execution Times

Postby uko » Mon Jan 06, 2020 9:06 pm

Hi !

Up to now I have mainly used the following table to estimate the blitter execution times for sizing my code (and also the Blitter FAQ from The Paranoid of Paradox):
http://retrospec.sgn.net/users/tomcat/miodrag/Atari_ST/Atari%20ST%20Internals.htm#BLITTER

These timings seem OK as loong as SKEW, NFSR are set to 0.
But, when using different values for these registers, but also by simply changing masks values to something different than $FFFF, these timings are no longer accurate.
This is logigical in fact, and after several tests I have now a better understanding of the bus accesses required by the blitter. Nevertheless since I am working only with emulators (Hatari and Steem SSE), I am wondering if there is a more exhaustive timings table that could confirm my assumptions.

Do you have any other links that could help me ?

Thanks

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Mon Jan 06, 2020 9:32 pm

Hi
if you want you can have a look at Hatari's source code, I think timings are really accurate, as it was verified with several demos running in overscan while using blitter at the same time (see the great "We were" by Oxygene for example).
I also compared several own test programs on real STE and Hatari to check special case when running a long operation (such as MUL or DIV) at the same time the blitter is starting.

Look in blitter.c to all the place where Blitter_AddCycles() is called.
But basically, you need to count 4 cycles for each physcical read from RAM and 4 cycles for each physical write.

By looking at the logical operation you want to perform with the blitter, you can have up to 2 reads and 1 write. Skewing or masking don't add any additional cycles.
(there're a few corner cases with specific value of xcount=1 that are not correctly handled in Hatari yet, but cycles should be OK for 99% of the remaining cases)

Nicolas

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Mon Jan 06, 2020 10:04 pm

Nicolas,

Thanks for pointing the Hatari source code, I'll have a look to it.

I am quite surprised that masking does not add additional cycles, and it seems that in my attempts (but I must check again of course), Hatari uses more cycles when mask is not $FFFF. But I am in the case where XCount = 1 (in fact XCount =1 when SKEW = 0, and XCount = 2 and NFSR=1 when SKEW <>0) so that could explain this.
I tell you I am surprised because for me there should not be any difference for the blitter between using:
- the mode 7 (source OR destination)
- the mode 3 (source) with mask different of $FFFF

In both cases this corresponds to doing a OR, and the first case requires 12 cycles, whereas the second would only require 8.

But of course I'm going to check all this again before going further.

Thanks for your help.

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Mon Jan 06, 2020 10:08 pm

Masking itself doesn't add cycles, but the blitter doc states that in some case depending on the mask value a RMW operation will be made, which add cycles because of the required additional memory access (see this comment in the source " When NFSR or mask is not all '1', a read-modify-write is always performed")

EDIT : note that if the part you want to compare is when XCOUNT=1 and NFSR and/or XFSR are set then you might hit the case that is not supported yet in Hatari and where cycles' count is certainly not correct.

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Tue Jan 07, 2020 12:14 am

uko wrote:I tell you I am surprised because for me there should not be any difference for the blitter between using:
- the mode 7 (source OR destination)
- the mode 3 (source) with mask different of $FFFF

In both cases this corresponds to doing a OR, and the first case requires 12 cycles, whereas the second would only require 8.


Both cases do have the same timing. As Nicolas is saying, a mask will force a destination read disregarding the mode. But a mask will not alter the timing if the mode already requires a destination read cycle.

npomarede wrote:(see this comment in the source " When NFSR or mask is not all '1', a read-modify-write is always performed")


NFSR doesn't force a destination read cycle. Why it would? NFSR affects the read of the source, not the destination.

It is true that there is no much point in using NFSR without a mask. But (if for some reason) NFSR is set and the active mask is $FFFF and the OP mode doesn't require the original dest data, then Blitter doesn't perform a destination read.
Fx Cast: Atari St cycle accurate fpga core

User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1862
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Blitter Execution Times

Postby Cyprian » Tue Jan 07, 2020 1:27 pm

npomarede wrote:Skewing or masking don't add any additional cycles.

If I'm not wrong:
- when mask is different than $FFFF then you have to add one bus cycle for destination read;
- skewing has an impact onto cycles when XFSR / NFSR is set.
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Tue Jan 07, 2020 1:37 pm

Cyprian wrote:
npomarede wrote:Skewing or masking don't add any additional cycles.

If I'm not wrong:
- when mask is different than $FFFF then you have to add one bus cycle for destination read;
- skewing has an impact onto cycles when XFSR / NFSR is set.

Hi
not to be too much punctilious, but I would say these 2 conditions have indirect impact on the total number of cycles, in the sense that skewing (shifting) bits per se doesn't take any cycle, same for masking (when comparing with the 68000 for example where LSL or AND are really taking cycles).
What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.
Nicolas

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Tue Jan 07, 2020 9:07 pm

ijor wrote:It is true that there is no much point in using NFSR without a mask. But (if for some reason) NFSR is set and the active mask is $FFFF and the OP mode doesn't require the original dest data, then Blitter doesn't perform a destination read.

It may be useful if you use bitplanes as layers, with 16 pixels wide graphics.

npomarede wrote:What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.

Yes, you are right, I have taken shortcuts in explaining, by mixing cycles and bus accesses.

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Tue Jan 07, 2020 9:45 pm

npomarede wrote:What really matters in the end is how many words you read /write, which can be a consequence of the logical operation, or the value xfsr/nfsr, or the value of the mask.


Right. For completeness I should add SMUDGE setting as well.

uko wrote:
ijor wrote:It is true that there is no much point in using NFSR without a mask ...

It may be useful if you use bitplanes as layers, with 16 pixels wide graphics.


How it would be useful to enable NFSR without a mask ???
Fx Cast: Atari St cycle accurate fpga core

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Tue Jan 07, 2020 10:36 pm

ijor wrote:How it would be useful to enable NFSR without a mask ???

In very limited cases... But imagine you have a background using only 3 bitplanes, and a one bitplane sprite of size 16x200 to move over (and of course accurate palette settings). In this case, when you set both SKEW and NFSR, you don't mind writing also the non-masked bits since they won't disturb the background.
And if your sprite is larger than 16 pixels, and that you display it in slices of 16 pixels (because it is easier to integrate in a fullscreen), you could use this for the first left slice.

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Wed Jan 08, 2020 9:45 pm

Hi!

In order to make things clearer (at least for me !), I have written a little test program to perform some tests and count the number of memory accesses performed by the blitter.
The source code is included in the zip file attached to this post.

This program blits a one bitplane rectangle onto the screen and the blue background color allows to measure the execution time. The grid has a point every 5 lines.
At the beginning of the source, there are parameters to allow testing easily various configurations:
- X position (if different from 0, then SKEW and NFSR are set)
- Rectangle width (X Count)
- Force mask values (by default they are computed from SKEW value)

I have run several tests with width = 2 words, but also with width = 1 word (even if Nicolas told that there could be some problems with Hatari in this case).
The results are presented in the Excel sheet (one tab per width value) also attached in the zip file. For each test configuration, I have put the theoretical number of accesses per line (at least my understanding !), and results obtained with the latest Hatari and Steem SSE versions. Sorry I have no real HW to test...
I hope I have not made any bugs, or errors in reporting results...

In most cases, my vision of the theory is compliant with both emulators.

But for cases where SKEW/NFSR are set and the right mask is forced to $FFFF, I have found some discrepancies. Hatari and Steem do no perform the same number of memory accesses !
I am more in favour of Steem results, but if someone can have a look, that could be nice !

BLIT_Test.zip
You do not have the required permissions to view the files attached to this post.

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Wed Jan 08, 2020 9:53 pm

Hi
could you provide the executable in your zip archive ? My STE is currently not available (still packed after recently moving home), but maybe some other people here could run your tests on their STE.
Nicolas

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Wed Jan 08, 2020 10:10 pm

npomarede wrote:Hi
could you provide the executable in your zip archive ? My STE is currently not available (still packed after recently moving home), but maybe some other people here could run your tests on their STE.
Nicolas

No problem. The source needs to be recompiled to change the test parameters, but I have compiled a version that tests the strange case.
It is attached to this post.
It requires around 55 scanlines on Hatari and 48 on Steem SSE.

BlitTestTOS.zip
You do not have the required permissions to view the files attached to this post.

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Thu Jan 09, 2020 2:13 am

uko wrote:But for cases where SKEW/NFSR are set and the right mask is forced to $FFFF, I have found some discrepancies. Hatari and Steem do no perform the same number of memory accesses !


As I said already, NFSR doesn't force a destination read, which seems what Hatari is doing wrong. That seems to be the difference.

... but also with width = 1 word (even if Nicolas told that there could be some problems with Hatari in this case).


Everybody is wrong in this case. If XCOUNT is set to one, Blitter doesn't fully honor the NFSR bit and performs a final source read regardless. It might be considered a Blitter buglet, but you shouldn't need to use NFSR if XCOUNT is one.
Fx Cast: Atari St cycle accurate fpga core

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Thu Jan 09, 2020 6:57 am

ijor wrote:Everybody is wrong in this case. If XCOUNT is set to one, Blitter doesn't fully honor the NFSR bit and performs a final source read regardless. It might be considered a Blitter buglet, but you shouldn't need to use NFSR if XCOUNT is one.

In the post describing the tests I have performed, when I talk about setting the value of Xcount to 1, it is for SKEW=0. As soon as the SKEW is set to a value different than 0, the program automatically increments Xcount (and sets NFSR). So I have not tested the case you mention (Xcount=1 and NFSR set).
Sorry for the misunderstanding.

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Thu Jan 09, 2020 4:37 pm

uko wrote:No problem. The source needs to be recompiled to change the test parameters, but I have compiled a version that tests the strange case.
It is attached to this post.
It requires around 55 scanlines on Hatari and 48 on Steem SSE.

BlitTestTOS.zip

Hi
I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.

Apart from the wrong timings, does the visual result / pixels look correct on screen anyway (similar to Steem in that case) ? (I guess it's correct if the problem is just an unnecessary read)

Nicolas

czietz
Hardware Guru
Hardware Guru
Posts: 1130
Joined: Tue May 24, 2016 6:47 pm

Re: Blitter Execution Times

Postby czietz » Thu Jan 09, 2020 5:05 pm

Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Thu Jan 09, 2020 8:16 pm

npomarede wrote:I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.

Hi !
Thanks for your support ! (and for all the work on Hatari !)
Feel free to PM me if you have any questions concerning the test code & results I have posted in this thread.

Apart from the wrong timings, does the visual result / pixels look correct on screen anyway (similar to Steem in that case) ? (I guess it's correct if the problem is just an unnecessary read)

The visual result is perfectly OK both in the test program, and in the concrete code I'm working on.
Up to now it is only a problem of unnecessary read. I'm focusing on it because it will be integrated (I hope so) in a fullscreen routine and I'm counting every cycle ; moreover since I'll use a 16 pixels wide splitting of my blitting, the sum of the extra reads finally costs some time :wink:

User avatar
npomarede
Atari God
Atari God
Posts: 1328
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Blitter Execution Times

Postby npomarede » Thu Jan 09, 2020 9:52 pm

uko wrote:
npomarede wrote:I will have a look at this case ; I have some work in progress to fix some wrong cases where xcount=1 (as Ijor described them to me) and not many time at the moment.
I will see if this can be fixed on current code base or if this needs to be part of the rewrite I'm planning to do.

Hi !
Thanks for your support ! (and for all the work on Hatari !)
Feel free to PM me if you have any questions concerning the test code & results I have posted in this thread


Hi
I had a look at the current blitter.c code and unfortunately the same "structural" changes in the code are required to handle this nfsr problem as to handle the xcount=1 case (historically, the first blitter code in Hatari was lacking accuracy, then in 2008 Tobé contributed a very-very good rewrite that matched closely the blitter's documentation and hugely improved emulation. This code was improved here and there to handle some non working cases (improved bus cycles counting in non hog mode for example), but some parts now need to be rewritten in a different way to better match the "real" state machine of the blitter, these parts can't be fixed without structural changes in the code for blitter.c)

Hopefully, I will be able to fix this in a near future :)

Nicolas

uko
Atariator
Atariator
Posts: 27
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Blitter Execution Times

Postby uko » Thu Jan 09, 2020 11:11 pm

npomarede wrote:Hopefully, I will be able to fix this in a near future :)

Hopefully I'm a slow coder and it will take me a long time to finish my screen ! :lol:
Thanks.

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Thu Jan 09, 2020 11:59 pm

uko wrote:In the post describing the tests I have performed, when I talk about setting the value of Xcount to 1, it is for SKEW=0. As soon as the SKEW is set to a value different than 0, the program automatically increments Xcount (and sets NFSR). So I have not tested the case you mention (Xcount=1 and NFSR set). Sorry for the misunderstanding.


I see. I missed that. No problem :)
Fx Cast: Atari St cycle accurate fpga core

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Fri Jan 10, 2020 12:08 am

czietz wrote:Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...


Non-existing memory region? Like accessing the 2nd MB on a machine with 1 MB RAM only? The timing in this case should be identical as accessing actual existing RAM. You are saying Hatari performs a different timing?

Or you mean Blitter accessing non RAM space, like ROM or I/O locations? The timing, wait states and alignment are identical and follows the same rules as the CPU accessing the same location. The only exception is accessing the ACIAs.

I know that some program Blit to the Shifter's palette. But here the timing is identical as Blitting to RAM.
Fx Cast: Atari St cycle accurate fpga core

czietz
Hardware Guru
Hardware Guru
Posts: 1130
Joined: Tue May 24, 2016 6:47 pm

Re: Blitter Execution Times

Postby czietz » Fri Jan 10, 2020 6:38 am

ijor wrote:
czietz wrote:Semi-on-topic: Hatari also differs significantly from real hardware with respect to the timing when blitting from or to a non-existing memory region. This is certainly a corner case, because why would someone do that...


Non-existing memory region? Like accessing the 2nd MB on a machine with 1 MB RAM only? The timing in this case should be identical as accessing actual existing RAM. You are saying Hatari performs a different timing?


I mean accessing e.g. addresses between 0x400000 and 0xDFFFFF, or in general any memory region for which no DTACK is generated but eventually a bus error is generated by Glue.

ijor
Hardware Guru
Hardware Guru
Posts: 3960
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Blitter Execution Times

Postby ijor » Fri Jan 10, 2020 1:03 pm

czietz wrote:I mean accessing e.g. addresses between 0x400000 and 0xDFFFFF, or in general any memory region for which no DTACK is generated but eventually a bus error is generated by Glue.


Ah, yes. And this includes not only addresses that provoke a Bus Error when accessed by the CPU. Same timing would happen if Blitter attempts to access the ACIA space because no DTACK is generated in that case, only VPA, but VPA is not connected to Blitter.
Fx Cast: Atari St cycle accurate fpga core

fenarinarsa
Atari freak
Atari freak
Posts: 51
Joined: Sat Mar 15, 2014 11:23 pm

Re: Blitter Execution Times

Postby fenarinarsa » Sat Jan 18, 2020 5:56 pm

Don't forget that on Mega STE the blitter takes 4 extra cycles to start ;)

I think Leonard had to take this into account for "We Were@" to work correctly on Mega STE.


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 5 guests