horizontal scrolling on ST

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Post Reply
User avatar
npomarede
Atari God
Atari God
Posts: 1341
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: horizontal scrolling on ST

Post by npomarede »

troed wrote: Simple cycle counting provides a hypothesis as to how it can happen but ideally we'd need trace diagrams ... ;) Delay HSYNC end for 12 cycles delays left border DE activation with 12 cycles = 6 bytes. Also, according to DHS the whole screen is shifted 8 pixels to the right - which indicates Shifter involvement here. Uhm, but Nicolas writes in video.c that the screen is shifted 8 pixels to the left.
Yes, I don't remember exactly where it was visible, but there're some parts of those DHS demo where it can be seen else background color is not aligned with bitmap (Evil / DHS also sent me at this time in 2010 some screenshots he took on his STE with different overscan to show the difference in centering).
The +20 STE overscan shifts the screen 8 pixels to the left, while the "classic" +26 STF/STE shifts the screen 4 pixels to the left.

By the way, for another demo that had problems running on my real STF, could you have a look at the Sync's vectorball screen in Swedish New Year II ? Depending on the wake up state of my STF, bottom border will work or not (lots of flickering). It's possible the bottom border switches are too "fast" to take as low cpu as possible and the soundtrack's interrupt add some jitter, but it really seems to be WU dependant. Which Sync's member should we blame for this ? :)

Nicolas
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

(BTW all, I've now broken out the MMU-DE detection from the GLUE state machine. I don't know if it became more readable. If I get enough info to include STE 20 byte left border I will break it out into ST and STE versions as well and then forum posts might not be a good way of distributing the info)
npomarede wrote: Yes, I don't remember exactly where it was visible, but there're some parts of those DHS demo where it can be seen else background color is not aligned with bitmap (Evil / DHS also sent me at this time in 2010 some screenshots he took on his STE with different overscan to show the difference in centering).
The +20 STE overscan shifts the screen 8 pixels to the left, while the "classic" +26 STF/STE shifts the screen 4 pixels to the left.
Thanks - I'm thinking this is due to pre-fetch for hires and all info helps.
By the way, for another demo that had problems running on my real STF, could you have a look at the Sync's vectorball screen in Swedish New Year II ? Depending on the wake up state of my STF, bottom border will work or not (lots of flickering). It's possible the bottom border switches are too "fast" to take as low cpu as possible and the soundtrack's interrupt add some jitter, but it really seems to be WU dependant. Which Sync's member should we blame for this ? :)
Ah if we had only known about wakeups back then ;) I do believe I've heard others complain that lower border, although I was inclined to believe it was people with "slow" CPUs rather than specific wakeups. I'll make a note of testing it myself when I re-connect my STF (currently the STE is set up, it just had a DMA chip transplant).

Oh, and those screens were written by Redhead :)
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

npomarede wrote: The +20 STE overscan shifts the screen 8 pixels to the left, while the "classic" +26 STF/STE shifts the screen 4 pixels to the left.
In emulation nice cases to see this are Wot a Scorcher's hidden screen for the +26 line and Riverside for the +20 line, because they mix overscan and normal lines.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote:Your comment about "line +2 rules for STF and STE being strange" is actually a perfect example.
No it's a poor example because I was very confused by loSTE in STE mode.
Because of bad thresholds, the calibration test failed and later bad timings were used, which I circled around with hacks.
The code is now straightforward.

Code: Select all

    t=54-16; //The line must "start" earlier on STE due to HSCROLL
#if defined(SS_MMU_WAKE_UP_SHIFTER_TRICKS)
    if(MMU.WakeUpState2())
      t+=2;
#endif

#if defined(SS_STF)
    if(ST_TYPE!=STE)
      t+=16;
#endif

    if(CyclesIn>t && FreqAtCycle(t)==60 &&
      (FreqAtCycle(376)==50 || CyclesIn<376 && shifter_freq==50)) //TODO WU?
      CurrentScanline.Tricks|=TRICK_LINE_PLUS_2;
And it's not looking for specific switches anymore. This is what you should compare with your pseudo-code.

Edit: here some traces

Code: Select all

loSTE: the program tests for +2 after removal of the bottom border
If we fail to make +2 during those tests, the timings will be wrong later.
STE mode
Y200 C0  052:S0002 512:T0002 512:#0162
VBL 719 shifter tricks 3317
Y200 C4  056:S0002 512:T0002 512:#0162
VBL 720 shifter tricks 3317
Y200 C8  060:S0002 508:T0002 508:#0162
VBL 721 shifter tricks 3317
Y200 C12  064:S0002 508:T0002 508:#0162
VBL 722 shifter tricks 3317
Y200 C16  068:S0002 508:T0002 508:#0162
VBL 723 shifter tricks 3317
Y200 C20  072:S0002 508:T0002 508:#0162
VBL 724 shifter tricks 3317
This results in this line during the demo:
199 - 420:S0000 504:S0002 512:T0200 512:#0160
But if test fails:
199 - 484:S0000 512:T0200 512:#0160
200 - 056:S0002    -> +2
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote: No it's a poor example because I was very confused by loSTE in STE mode.
Because of bad thresholds, the calibration test failed and later bad timings were used, which I circled around with hacks.
The code is now straightforward.

loSTE: the program tests for +2 after removal of the bottom border
If we fail to make +2 during those tests, the timings will be wrong later.
Right, so, I just want to make clear that I think that emulating the behaviour as seen in demos is a very good way to make sure that existing software can be run and displayed as if we're on real hardware. I think you, the Hatari crew and previous emulator developers are doing an excellent job there.

The reason why I'm going on about a state machine is because of the future ;) Being able to simulate rather than emulate would make the implementation simpler as well as closer to the real hardware. It's just about reasonably computationally possible today, and will be even more so in a few years. For FPGA-implementation it probably already is.

However, we're not there yet. The knowledge needed is still researched. The discussions here are really helpful in advancing that knowledge.

With all that said - just to make it clear where I'm coming from - LoSTE doesn't test for +2 lines, at least not at all intentionally, when calibrating the lower border :P If that's what's needed for emulation then excellent - but the actual border code just switches 60/50 over a few different positions separated by one nop and checks to see whether the border opened. It's also the same code for ST and STE (for the lower border, the top border timing is modified depending on STE and detected wakeup). If the calibration loop is indeed that tight that not creating a +2 line makes it fail then that's probably a bug in LoSTE and something I should've picked up when testing on real hardware (and in Hatari).

I've documented a few things that didn't end up as planned with LoSTE (removing the RESET instruction from cleanup before launching the screens had side effects, not being able to properly clear Shifter words even though staying in 60Hz for one VBL on exit causes the Loading logo to be shifted sometimes and unstable top border timing code on both the "slow" STF hardware as well as "fast" STE) - I'll make a note of verifying this as well. It really surprises me.

Thanks

/Troed

(I've spent the weekend replacing the DMA chip in my STE and will hopefully find time to play with +20 left border on real hardware soon. I might start writing up the Shifter state machine in the meantime - it would've probably saved me countless hours to have had that when I tested my sync scroll combos for the different wakestates in LoSTE instead of just brute forcing them all ... )
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote:
With all that said - just to make it clear where I'm coming from - LoSTE doesn't test for +2 lines, at least not at all intentionally, when calibrating the lower border :P If that's what's needed for emulation then excellent - but the actual border code just switches 60/50 over a few different positions separated by one nop and checks to see whether the border opened. It's also the same code for ST and STE (for the lower border, the top border timing is modified depending on STE and detected wakeup). If the calibration loop is indeed that tight that not creating a +2 line makes it fail then that's probably a bug in LoSTE and something I should've picked up when testing on real hardware (and in Hatari).
I see, this is a confusing program indeed.
I could only guess at the intent, but in the traces above, this makes the difference between correct display (test lines are 162) and bogus +2 at line 200 (test lines are 160). I thought you wanted to make sure your bottom border removal wouldn't trigger a +2 by accident. In the traces, shifter tricks 3317, the $300 part means 'top' ($100) and bottom ($200) off, so the border is always removed. But forget it, there could be another Steem bug.
Anyway, this demo improved ''0', +2' and '-106' lines emulation in Steem, so keep them coming. :thumbs:
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
MasterOfGizmo
Atari God
Atari God
Posts: 1343
Joined: Fri Feb 08, 2013 12:15 pm
Contact:

Re: horizontal scrolling on ST

Post by MasterOfGizmo »

Please forgive me for capturing your great thread. Does one of you have a pointer to a documentation of the exact video modes of an ST (not talking about any scrolling or overscan tricks). I am searching for the very basic timing parameters to re-create it in verilog.

If you can/want to help out please reply here http://www.atari-forum.com/viewtopic.php?f=101&t=25522

I promise not to capture this thread again ...
MIST board, FPGA based Atari STE and more: https://github.com/mist-devel/mist-board/wiki
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Hi all,

I believe I have a working and empirically tested hypothesis consistent with theory as to how the wakestates arise :P This post documents that hypothesis.

As always, knowledge does not appear out of thin air. Thanks to Dio for measurements and theory, to mc6809e for voicing the 32->8 clock boundaries as causal and to Paolo - as always - for the wakestate discovery, tests and documentation as found in his Excel sheet.

With that said;

Prerequisites:

1) As seen from the CPU, the GLUE - while also at 8MHz - can be offset 0-3 cycles due to lack of synchronization when powered on. This means that when we talk about "cycle 56" it can in reality mean cycle 56-59 for GLUE.

2) Both FREQ (ff820a) and RES (ff8260) are inside GLUE. While Shifter has a copy of ff8260, all modifications to these two registers causing sync changes are GLUE and GLUE only.

3) When GLUE checks the state of FREQ and RES, due to unknown implementation/wiring/signal-propagation reasons, RES checks are one cycle later than FREQ checks.

Result:

The above causes the following timing possibilities if we look at when GLUE really checks (as seen by the CPU) the interesting "cycle 56" position for 000-byte line:

FREQ 56, RES 57
FREQ 57, RES 58
FREQ 58, RES 59
FREQ 59, RES 60

Depending on whether FREQ == 50 and RES == LO when these checks are made, H (thus DE) is raised one cycle later (58-61).

Dio has documented that depending on wakestate, there's a lag from raised DE to MMU raising LOAD of 3-6 cycles. The MMU is not affected by the GLUE 0-3 offset, which means it looks like this:

MMU detects GLUE DE at cycle 62* and raises LOAD at cycle 64

64-58 = 6 = DL6
64-59 = 5 = DL5
64-60 = 4 = DL4
64-61 = 3 = DL3

If we for a second shift our attention to visible pixels on screen, we must remember that GLUE decides (through HSYNC) where the screen is physically placed by the monitor. If GLUE is "late" 3 cycles, as in the DL3 example above, the distance between the screen start and the LOADed pixels displayed by the Shifter will be shorter. We should thus see the screen being shifted one pixel per DL-state above - which is exactly what Dio has documented. DL3 leftmost, DL6 rightmost.

Alright, back to GLUE being offset 0-3 cycles compared to the CPU. There's no way for us to test or detect this from software, we simply don't have one-cycle resolution on the ST. We do however have two-cycle resolution thanks to the use of EXG before MOVE instructions. If we try to change the values of FREQ and RES with as much detail as possible, for GLUE to pick it up, it will look like this:

Changes made by CPU at FREQ 56, RES 56 - read by GLUE at FREQ 56, RES 57
Changes made by CPU at FREQ 56, RES 58 - read by GLUE at FREQ 57, RES 58
Changes made by CPU at FREQ 58, RES 58 - read by GLUE at FREQ 58, RES 59
Changes made by CPU at FREQ 58, RES 60 - read by GLUE at FREQ 59, RES 60

The above is not guesswork. Those are the exact values as documented by Paolo and I have posted them before, but like this:

WS1 (DL6): screen DE is FREQ 56, RES 56
WS3 (DL5): screen DE is FREQ 56, RES 58
WS4 (DL4): screen DE is FREQ 58, RES 58
WS2 (DL3): screen DE is FREQ 58, RES 60

(Verify with the Excel sheet - Default RES values are for WS3/4. In WS1 all RES state checks happen 2 cycles earlier, in WS2 2 cycles later. Default FREQ values are for WS1/3, in WS2/4 all FREQ state checks happen 2 cycles later. The above is the result for a specific position)

Conclusion: The known detectable wakestates are the result of GLUE being offset 0-3 cycles compared to the CPU - which fits with theory (unsynchronized initialisation at same clock) as well as observation (DE-to-LOAD, visible pixel position on screen) and empirical testing of when changes to FREQ and RES have to be made for GLUE to detect them.

Comments welcome :)

(There might be some uncertainty as to exactly at which cycle which signal is raised and detected but I don't believe it will change the conclusion)

/Troed

PS: All other known wakestates - Spectrum 512 dots, banded/non-banded etc are most likely in Shifter - which with its 32Mhz clock that it then divides further cause similar possibilities. It will also be "lagged" due to receiving DE directly from GLUE with the 0-3 cycle possible offset yet receive LOAD and data from MMU at a fixed cycle (as fix as it can be due to the different clocks, 16 and 32). That might be causal in why WS1/DL6 isn't possible to make unstable for 4-pixel scrolling - I'll get back to that when I've "finished" the Shifter state machine. edit: Oh, and yes, this should explain why there's a WS3 that sometimes behave as WS1 when it comes to unstabilization - the mapping between GLUE wakestates and Shifter isn't 1:1 although it seems to be highly influenced.

*) Why it doesn't detect DE at 58 and 60 when raised that early as well? I don't know.
Last edited by troed on Mon Sep 30, 2013 11:37 am, edited 1 time in total.
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

deleted: double post
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote: Comments welcome :)
Well, it demands time to digest, but right now I have a question.
Is this compatible with the simpler WU view I posted before, that is every 4 cycles, starting from linecycle 0, 2 are for the CPU, 2 for the MMU, and the order determines WU1 (CPU/MMU) or 2 (MMU/CPU)?
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

MasterOfGizmo wrote:Please forgive me for capturing your great thread. Does one of you have a pointer to a documentation of the exact video modes of an ST (not talking about any scrolling or overscan tricks). I am searching for the very basic timing parameters to re-create it in verilog.

If you can/want to help out please reply here http://www.atari-forum.com/viewtopic.php?f=101&t=25522

I promise not to capture this thread again ...
I think Dio answered in your thread and he's more a hardware expert. Have you checked his trace graphs?
What could help is looking at the SM124 & SC1224 service manuals in pdf, they may include timing charts.
I'm preparing some doc myself but it's not ready yet.

Below is copy/paste of an Excel sheet, it's very bad format, a shame, I'm not making fun, just to show you I can't help much more yet, but some of the info I found in the (incomplete too) pdf files.
(PAL "emulator" STF)
:oops:

Code: Select all

Frequency		50	60	72
shift mode				2
cycles/scanline		512	508	224
# scanlines		313	263	500
top		64	34	34
DE		200	200	400
bottom		49	29	66
cycles/frame		160256	133604	112000
cycles/sec		8021248	8021248	8021248
refresh rate (Hz)		50.05271565	60.03748391	71.61828571
#HBL/sec		15666.5	15789.85827	35809.14286


DE cycles		320	320	160
Bytes fetched		160	160	80
Start of DE cycle		56	52	4
left off bonus		26	24	0
First pixel cycle		84	80	32
End of DE cycle		376	372	164
HBL cycle		464	460	180
cycles to HBL		88	88	16
right off bonus		44	44	0
cycles to new line		48	48	44
right off bonus		24	24	0
		512	508	224


		8000000	8000000	8000000




				0.000028


				160

				28




Monitor				
Hor freq (KHz)				35.7
Vert freq (Hz)				71.2
horizontal retrace time (µs)				6.3
vertical retrace time (µs)				420


line (µs)		64	63.5	28
hsync (µs)			5	3
hblank			10.5	3

left			6	3.5
de			40	20
right			7	1.5
			53	25

			63.5	28

vblank ms 			1.396
vsync			0.19


all			16.69

top			1.184
de			12.7
bottom			1.41

In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote: Can the STE hardware scroll in high res as well? If so, it needs 16 cycles (?) earlier pre-fetch compared to ST for high res as well - which should mean that the left border switch is earlier (which is true - "late" ST left border code, like ours in SYNC - doesn't work on STE) and DE is earlier. 8 - 16 = 504 so it sort of matches up, although it doesn't explain why a left border switch at 0 works. I wonder if this means that the STE HSYNC pulse is shorter. Also, maybe the prefetch is only 4 cycles for high res - I'm just speculating here and haven't studied it enough. STE is "after my time", you know ;)

http://atari-ste.anvil-soft.com/html/devdocu2.htm
Coming back to this...
HSCROLL works in med res, this demo Cool STE proves it.
When I "fixed" it (meaning fixing Steem), I noticed that HSCROLL was in pixels, not in cycles, and this is what the STE programmer would expect. So, one raster. Probably the same for HIRES.
This would mean, 16 cycles scroll/prefetch in low res, 8 in med res and only 4 in HIRES.
What's nice about it is that it explains the STE limit for "left off": 6-4 = 2 (a new revelation!)
Someone could test this with +2 lines in med res. eg if S2 at cycle 44 works.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
npomarede
Atari God
Atari God
Posts: 1341
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: horizontal scrolling on ST

Post by npomarede »

Steven Seagal wrote: This would mean, 16 cycles scroll/prefetch in low res, 8 in med res and only 4 in HIRES.
What's nice about it is that it explains the STE limit for "left off": 6-4 = 2 (a new revelation!)
Someone could test this with +2 lines in med res. eg if S2 at cycle 44 works.
I don't have an STE to test this, but I don't think scrolling prefetch is different from normal STF shifter's fetching where the shifter always read 2 bytes every 4 cycles, whatever the resolution is. As scrolling prefetch data need to be sent to "normal" internal shifter registers used for each planes, you need to prefetch on the same speed as the shifter normally fetches.
This could be tested on real STE by reading ff8209 during border time (starting at cycle 0 of a line) then doing something like this to get a rough idea of where the address get incremented and by how many bytes at a time :

Code: Select all

; get in sync with ff8209 and wait for cycle 0 on next line
; a0=$ff8209    a1=small buffer for at least 8 bytes
rept 8
move.b (a0),(a1)+
endr
This only has 12 cycles precision, by adding 1 or 2 nop before, it's possible to get down to a 4 cycles precision to see where A0 changed.
Running this piece of code with ff8260=0, then 1, then 2 and comparing the collected values will show if prefetch is done at a different rate depending on the resolution.

Nicolas
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote: Well, it demands time to digest, but right now I have a question.
Is this compatible with the simpler WU view I posted before, that is every 4 cycles, starting from linecycle 0, 2 are for the CPU, 2 for the MMU, and the order determines WU1 (CPU/MMU) or 2 (MMU/CPU)?
Well, as long at it means the GLUE will become offset by 2 cycles as seen by the CPU you will go from DL6/DL5 to DL4/3 (WU1 and WU2 as collectively known). I'm less sure about other side effects by CPU/MMU reordering (and what effect it would have on my hypothesis if it happens in real hardware. Maybe explaining why some machines are more inclined towards one wakeup than "the" other?).

I'm going to try to shoot holes in my hypothesis for a while now and invite everyone else to as well ;) If it survives then it might turn out to be a viable model.

(I have also given the Shifter state machine some initial work and so far I'm very confused. Maybe it's just me, but if I were to write some synclines right now I would not be using stabilizers in the same way as I have so far .. and that's not just because at least the hi/lo ones delay BLANK and risk messing up HSYNC if there's pixel data there :P)
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

troed wrote: (I've spent the weekend replacing the DMA chip in my STE and will hopefully find time to play with +20 left border on real hardware soon.
Now done. A similar description to Paolo's below:

Code: Select all

500     HSYNC end
502     left border first possible move to HI     
504
506
508         
510
0        left border last possible move to HI
2        first possible switch back for 20 byte border
4
6       first possible switch back for 26 byte border
Remember, so far everything points to STE behaving as ST wakestate 1 - that is - 0 cycle offset between CPU and GLUE. This is fully logical due to MMU and GLUE having been merged into one in the STE. So, when comparing the above to Paolo's Excel-sheet all ff8260 timings in there should be subtracted by -2 to end up with wakestate 1 values.

Findings:

1) HSYNC end is indeed 4 cycles earlier on STE than ST. I will continue investigating whether this is due to a shorter pulse or if it's a change in start timing as well.

2) "IF(RES == HI) H = TRUE" is also 4 cycles earlier on STE. This is likely due to pre-fetching, I will understand it better when I'm further into my work on the Shifter state machine

3) There's a check at cycle 4 that decides between classic 26 (if HI) byte border or 20 (if LO). I'm unsure exactly what that check is and what consequence it has - although I could speculate that it causes the Shifter (which has a copy of 8260) to throw away what had been pre-fetched so far.

4) Not related, but when testing HSYNC end (known as "No Line 2" in Paolo's excel sheet and one of several "0-byte lines" by emulators) I verified that it disrupts sync. It might be interesting to at least comment in emulator trace output - demos using that line will be discolored on my TV.

I will update my GLUE (and MMU) state machine with both the wakestate hypothesis and separate into ST and STE later. I'm thinking the forum wiki and then comment on updates here.

/Troed
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

troed wrote: 3) There's a check at cycle 4 that decides between classic 26 (if HI) byte border or 20 (if LO). I'm unsure exactly what that check is and what consequence it has - although I could speculate that it causes the Shifter (which has a copy of 8260) to throw away what had been pre-fetched so far.
This theory gains support the more I test different things on STE to see where it differs from ST (more than I thought, btw). I've just created 164 and 166 byte lines as well :P

(Which is of little interest on STE of course, but for the novelty)

/Troed
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

troed wrote:Remember, so far everything points to STE behaving as ST wakestate 1 - that is - 0 cycle offset between CPU and GLUE. This is fully logical due to MMU and GLUE having been merged into one in the STE. So, when comparing the above to Paolo's Excel-sheet all ff8260 timings in there should be subtracted by -2 to end up with wakestate 1 values.
One important thing to note when considering this is that on the STE the sync register is in the video domain, not the CPU domain. It's impossible to do odd phase accesses to it.

Another thing to consider is that there may be a completely different wakestate in the STE, an actual DL2. My theory is that on the standard ST the clock cascade between the 8MHz and 16MHz clock and the propagation delay in the Glue causes it to just miss the accept window for DE and that adds 4 cycles to the latency. However, it's possible that in the STE the propagation delays are lower and so it can actually achieve DL2.

LOAD and DE are both available on the board (they're not internal signals, since they have to travel from the MCU to the GST Shifter, but I haven't got an STE in bits handy. (Plus the STE is a much less interesting candidate for examination to me given that it's importance in the scheme of things is pretty low).
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

All: I've written up the STE GLUE state machine now, as well as having started on the Wiki article. It's not linked to from anywhere yet since I consider it to be in a draft state - specifically the section on the MMU should probably be worded differently (it's likely not a state machine) and its timing is probably 2 cycles off.

Remember, it's a wiki. Feel free to edit as you see fit - I do not consider myself to "own" it just because I wrote it and updated Alien's old state machines :P

http://atari-forum.com/wiki/index.php?t ... _Scanlines
Dio wrote:One important thing to note when considering this is that on the STE the sync register is in the video domain, not the CPU domain. It's impossible to do odd phase accesses to it.
(I might misunderstand you, if so I'm sorry. I'm also writing for the much bigger general audience whom I've heard read this thread ;))

I couldn't wrap my head around the empirical description, which obviously was correct, of how the 000-byte line was produced for quite some time. The problem turned out to be that I was stuck (as I know others have been) in thinking ff802a was "Shifter" and ff8260 was "GLUE". I saw no way for the Shifter to be able to take a decision (60/50Hz switch) at cycle 56 in WS1 which could cause a 000-byte line, yet the GLUE had the same possibility at the same cycle with it's HI/LO switch. Two separate circuits can't communicate in 0 cycles, at least that was what confounded me.

The solution was quite obvious (and I believe I've seen Ijor comment upon it already in 2006 here on the forum) but it took until I found the following article for me to realise what was wrong: http://info-coach.fr/atari/hardware/STE-HW.php
The configuration register for synchro (50/60 Hz and synchro int. /ext.) is in GST MCU (and GLUE of the STF) and not in the Shifter, as the localization of the address of this register (see figure 10) could let it think . It is for this register that the GLUE of the STF is provided with two data pins connected to bits 8 and 9 of the bus, which corresponds well to the 2 modifiable bits of the register. But if that explains how the GLUE knows if it must send 50 or 60 Hz synchros, how does it knows that it must send 71 Hz (high-resolution)?
In fact, if you look at the video addresses (figure 10), you will notice that another register is also on bits 8 and 9: VIDEOMOD which makes it possible to indicate the resolution to the shifter, but also to the GLUE (GST MCU on STE). Thus at address FF8260 correspond two registers one in the shifter and one in the GLUE. The only video registers which are in the shifter of the STF are the COLORS and VIDEOMOD.
Once I realised that both changes to FREQ (synchro) and RES (videomode) only need to be detected in GLUE the rest of my wakestate-hypothesis fell nicely into place.

This also means, to get back to your comment, that as far as I understand there's no video domain for either of these switches in the STE either. It's possible that changes to RES are detected slightly different by the Shifter, but they don't seem to be involved in sync lines.
Another thing to consider is that there may be a completely different wakestate in the STE, an actual DL2. My theory is that on the standard ST the clock cascade between the 8MHz and 16MHz clock and the propagation delay in the Glue causes it to just miss the accept window for DE and that adds 4 cycles to the latency. However, it's possible that in the STE the propagation delays are lower and so it can actually achieve DL2.
Yes, I agree. After having gone through the timings in the STE it seems all HSYNC and BLANK signals are two cycles earlier compared to the ST (in WS1), but the cycles at which RES and FREQ are checked and DE raised aren't. This should mean that an STE screen is physically offset two pixels to the right of (the already rightmost) WS1/DL6 on an ST.

If this _isn't_ the case I would assume there's a two cycle (at least) less lag internally from "MMU" picking up "GLUE DE" in the MCU :P Which sounds very likely.
LOAD and DE are both available on the board (they're not internal signals, since they have to travel from the MCU to the GST Shifter, but I haven't got an STE in bits handy. (Plus the STE is a much less interesting candidate for examination to me given that it's importance in the scheme of things is pretty low).
From the article above I deduce that his DCYC (on the MCU) is LOAD on the Shifter, and it also receives DE and BLANK (btw, it seems BLANK on STE is on the same voltage as black/no color instead of lower as on the ST?). It's also the MMU/MCU that decides when to increment video counters (05/07/09).

This becomes quite interesting when considering the pre-fetch. I have a working hypothesis as to how the 20 byte left border (as well as +4 and +6 that I found, which should be able to create 162(a new one),164,166,208 and 210 byte lines*) works - but it involves the MMU LOADing up the Shifter _without_ increasing the video address counters - that decision is taken after the 16 cycles depending on the other hardware scroll registers. That logic is clearly more complex in the STE compared to the STF, if so.

(And if it preloads for 16 cycles in high res as well then it actually preloads for 8 cycles before even knowing if there's a screen to start displaying. THAT would look strange in a trace .. )

edit: Maybe I should point out that I assume horisontal scroll is always 16 pixels in all resolution and thus pre-fetch should need one word (4 cycles) in high res, two words (8) in medium and four (16) in low. That works out well with the timings as well, so, no strange traces. It was hypothetical ;)

/Troed

*) I don't see a need to add these to emulators at the moment ;) Unless someone writes a demo for the sole purpose of not being emulated correctly ...
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

troed wrote:Once I realised that both changes to FREQ (synchro) and RES (videomode) only need to be detected in GLUE the rest of my wakestate-hypothesis fell nicely into place.

This also means, to get back to your comment, that as far as I understand there's no video domain for either of these switches in the STE either. It's possible that changes to RES are detected slightly different by the Shifter, but they don't seem to be involved in sync lines.
Ah, hang on. I've just checked the timing tester (where I observed this behaviour) and it's not the sync register, but the screencurrent register. So let me run through the whole thing from scratch:

On both STE and STFM writes to the resolution register are always delayed until a video phase, since they need to be transferred through the bus gateway onto the DRAM/Shifter bus which is only available on video phases.

On the STFM writes to the sync register go only to the Glue, and can happen on either CPU phases or video phases. Similarly, reads from screencurrent go only to MMU and can happen on either phase.

On the STE reads from screencurrent can only happen on the video phases. I observed this when I was writing the instruction timing tester. It's easy to write a program that detects this behaviour.

So I don't know if the sync register is in the CPU or the video domain on the STE.
User avatar
ljbk
Atari Super Hero
Atari Super Hero
Posts: 514
Joined: Thu Feb 19, 2004 4:37 pm
Location: Estoril, Portugal

Re: horizontal scrolling on ST

Post by ljbk »

Hello !

I just want to clarify one thing: the "noline1" and "noline2" i refer in my Excell sheet do not correspond to "clean" 0 byte lines.
In fact, it seems that the normal "rest" of the screen is moved down by one line and the number of bytes read by the MMU will be the same at the end of the VBL.
Although i did not test such extreme cases, i doubt that the "noline" effects, like the 14 bytes line, can be repeated for many lines without causing some serious image problems.
I also want to point out that mixing 508 cycles lines (NTSC ones) with 512 cycles lines (PAL ones) even if there is only 1 different line might lead to image bending with some TV screens. The same occurs if you miss the correct spot to go back to 50 Hz when doing a +2 bytes lines.

@Troed
In your wiki document you can refer the other way to get a 160 bytes line: +2 at start and -2 at end.


Paulo.
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Dio wrote:On both STE and STFM writes to the resolution register are always delayed until a video phase, since they need to be transferred through the bus gateway onto the DRAM/Shifter bus which is only available on video phases.
Alright, I'm doing my best to follow here ;) IIRC every other cycle is a video phase? So, this will have the same consequence as far as quantization goes as the limit already being set by the instructions available (MOVE, EXG+MOVE)?
On the STFM writes to the sync register go only to the Glue, and can happen on either CPU phases or video phases.
I'll take your word for it :) It's beyond my knowledge - to me it's "the same bus".
So I don't know if the sync register is in the CPU or the video domain on the STE.
If I understand it correctly what will happen is a two cycle quantization and then it's "only" a matter of which of the cycles the write to the register happens on?
ljbk wrote:I just want to clarify one thing: the "noline1" and "noline2" i refer in my Excell sheet do not correspond to "clean" 0 byte lines. In fact, it seems that the normal "rest" of the screen is moved down by one line and the number of bytes read by the MMU will be the same at the end of the VBL.
Yes, as far as I can see a cancelled HSYNC (either by extending the blank just before it or by making sure it doesn't trigger) will as a side effect not increase the HBL counter. Since it will be seen as a single line by the monitor/TV taking twice the normal amount of cycles it's as you wrote most likely have almost immediate distorting effects.
@Troed
In your wiki document you can refer the other way to get a 160 bytes line: +2 at start and -2 at end.
Sure, I'll add it as an example of how to count the amount of bytes for a line. But I'll repeat my wish for everyone to edit it as you see fit - it's the same login for the wiki as for the forum. I don't really want it to be seen as mine :P

(Actually, I should link to the post where you attached the synclin0 program as well!)

Btw, it's my hope that when my understanding of the Shifter state machine increases it might be possible for us to figure out what causes your 4 pixel normal screen sync scroller to sometimes fail (WS1, and in some WS3 variants). And as a pipe dream - maybe even a solution.

You made a post earlier in this thread;
A line 158 in WS2 can do the following as far as i remember without looking at my notes:
- line started 0 shifted => next line starts 0 shifted;
- line started -4 shifted => next line starts 0 shifted;
- line started -8 shifted => next line starts -8 shifted;
- line started -12 shifted => next line starts -12 shifted;
A line 162 in WS2 can do the following as far as i remember without looking at my notes:
- line started 0 shifted => next line starts -4 shifted;
- line started -4 shifted => next line starts -4 shifted;
- line started -8 shifted => next line starts -8 shifted;
- line started -12 shifted => next line starts -12 shifted;
Which is exactly the behaviour I'm hoping to be able to replicate in pseudo code. Am I correct in that you've during the years you've worked on the 4 pixel scroller extensively documented this behaviour?

Like with your Excel sheet - it's a treasure trove of information.

/Troed
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

ljbk wrote:Again for emulator programmers, i wish to add that the "kind-of-C" pseudo code has one flaw.
It might never happen but one has to consider the following case:

cycle 360 : FFFF8260: $00 -> $02
cycle 368 : FFFF820A: $02 -> $00
cycle 376 : FFFF820A: $00 -> $02
cycle 384 : FFFF8260: $02 -> $00

The change at cycle 368 should cause a line 158 type ending (1 word less going to the Shifter).
But in fact we get a Right Border openning due to the FFFF8260 change to $02.
So aparently, any changes to FFFF820A, will only be considered by the GLUE at the moment when FFFF8260 comes back to 0 or 1.The same kind of timing overlaps may occur at the start of the screen for the 0 bytes line case.
While going through old posts in this thread for attribution at the wiki page I came across the above - and since it directly applies to state machine pseudo code that I've never been satisfied with I thought I would give it a go ;)

Code: Select all

52	IF(FREQ == 60) H = TRUE
56	IF(FREQ == 50) && (RES == LO) H = TRUE
That just looks illogical. Those checks ought to be the same in 60Hz and 50Hz, that is, both should check RES as well. To test this we have to be able to be in 60Hz and switch to HI at 372 and be back at LO already at 376 - which luckily we can do. Thanks to NoCrew's 4 pixels raster-trick (ADD/MOVE*) it's indeed possible in special cases to switch back'n'forth in four cycles. It's not really useful for sync tricks on the STF due to wakeup modes, but I did my test on STE and have no reason to believe the results aren't true for ST as well.

Results:

hi/lo (50Hz)
372/376 - 160 byte line
376/380 - 204 byte line

hi/lo (60Hz)
372/376 - 204 byte line
376/380 - 158 byte line

The above leads me to conclude that the tests at 52, 56, 372 and 376 on ST (36, 40, 372 and 376 on STE) should look like this instead:

Code: Select all

52	IF(FREQ == 60) && (RES == LO) H = TRUE
56	IF(FREQ == 50) && (RES == LO) H = TRUE
...
372	IF(FREQ == 60) && (RES == LO) H = FALSE
376	IF(FREQ == 50) && (RES == LO) H = FALSE
You learn something new every day ;) I've been itching to change this just because it made more sense but it feels a lot better having actually verified it.

/Troed

*) move.b d1,(a0) / move.b d0,(a0) writes the values at the beginning of the instruction, and the instruction takes 8 cycles. The switch is thus 8 cycles long and would cover both tests above. Using add.b d1,(a0) / move.b d0,(a0) will write the value 8 cycles into the 12 cycles long add-instruction, 4 cycles before the second move is done. The switch is now only 4 cycles long, although the total time is increased to 12+8.
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

troed wrote:I'll take your word for it :) It's beyond my knowledge - to me it's "the same bus".
There's two data buses in the ST, the CPU bus and the DRAM / Shifter bus. The two are bridged by a bus gateway - a buffer and a latch, controlled by RDAT, WDAT and LATCH on the MMU.

The DRAM bus is segmented by MMU into two phases each of two 8MHz cycles, one of which is a CPU phase and the other of which is a video / refresh (and sound on the STE) phase. If the CPU tries to access DRAM or the Shifter on a video phase, DTACK is withheld for two clock cycles to insert a pair of wait states and align the access onto the CPU phase. But anything only on the CPU bus (including Glue and the MMU itself, at least in the STFM) can be accessed on any phase.
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Tonight I wanted to pin down the exact location where 512 resp. 508 cycles is decided on the STE. And I came away with some very strange results that I want to share - speculate away :P

First, in the STF we can see Paulo having documented cycle 54 (WS1) as being a position related to sync. This is right in between the starting points of NTSC and PAL lines, and so has been the reason why +2 and 0-byte lines where tricky to create. Depending on whether FREQ is 60/50 the line is 508 or 512 cycles - and if RES is HI sync pulses become very strange indeed.

On the STE, the starting points for NTSC and PAL (due to pre-loading for hardware scroll) are 16 cycles earlier, so I wanted to verify if the 508/512 decision had moved as well. The results have me confused:

56 Line changes depending on FREQ 50 or 60, as well as whether RES is LO or HI (similar to ST cycle 54)
58 Line changes depending on FREQ 50 or 60, RES irrelevant

Now, since I'm testing with an LCD TV it's very difficult to know what happens since they tend to try to smoothen out any incoming signal issues - but when I did 40 lines with a 60/50 switch at each of the above positions with colour-markers to see if they line up this is what I got:

56: line 1 and 2 slightly offset then a huge jump to line 3,4 and then slightly back and all other vertically aligned
58: first 1 and 2 slightly offset - then all others vertically straight.

Thoughts? I tried changing from 128 to 127 nop lines in the code since I thought that maybe the decision whether the line is 508 or 512 cycles is separate (!) from the actual pixel clock setting (60Hz/50Hz) - but can't say I succeeded.
Dio wrote:If the CPU tries to access DRAM or the Shifter on a video phase, DTACK is withheld for two clock cycles to insert a pair of wait states and align the access onto the CPU phase. But anything only on the CPU bus (including Glue and the MMU itself, at least in the STFM) can be accessed on any phase.
Thanks for the explanation. Does this mean the MMU consults a memory map to know whether it needs to do this for a specific memory address or not?
User avatar
ljbk
Atari Super Hero
Atari Super Hero
Posts: 514
Joined: Thu Feb 19, 2004 4:37 pm
Location: Estoril, Portugal

Re: horizontal scrolling on ST

Post by ljbk »

Hi !

Just to add my 2 cents to the STE case, i would like to point out the following.
In the 2006 thread, refered previously, as i don't have an STE, i asked to the forum if someone could test a few test programs in order to have the 1+3 lines syncscroll to work on STE as well. Based on the responses, i made a STE patch case.
If you look at the source available at DHS, you will see that there is only one STE patch related to line sizes. It is related to the 0 bytes case: 60Hz has to be set at emulator cycle 40 and we go back to 50Hz at emulator cycle 52.
So i assume that the +2 case works in the same way for STE as for STF for the detected (old)wakeup state on STE by that program (probably 1).

I will try to find that precise spot in the 2006 thread and if i do find it, i will update this post.

Paulo.

Edit:
Here is the spot: http://www.atari-forum.com/viewtopic.ph ... &start=125
MiggyMog reported his results on a real STE.
Post Reply

Return to “Coding”