horizontal scrolling on ST

troed · Post by **troed** » Mon Sep 30, 2013 10:26 am

Hi all,

I believe I have a working and empirically tested hypothesis consistent with theory as to how the wakestates arise

This post documents that hypothesis.

As always, knowledge does not appear out of thin air. Thanks to Dio for measurements and theory, to mc6809e for voicing the 32->8 clock boundaries as causal and to Paolo - as always - for the wakestate discovery, tests and documentation as found in his Excel sheet.

With that said;

Prerequisites:

1) As seen from the CPU, the GLUE - while also at 8MHz - can be offset 0-3 cycles due to lack of synchronization when powered on. This means that when we talk about "cycle 56" it can in reality mean cycle 56-59 for GLUE.

2) Both FREQ (ff820a) and RES (ff8260) are inside GLUE. While Shifter has a copy of ff8260, all modifications to these two registers causing sync changes are GLUE and GLUE only.

3) When GLUE checks the state of FREQ and RES, due to unknown implementation/wiring/signal-propagation reasons, RES checks are one cycle later than FREQ checks.

Result:

The above causes the following timing possibilities if we look at when GLUE really checks (as seen by the CPU) the interesting "cycle 56" position for 000-byte line:

FREQ 56, RES 57
FREQ 57, RES 58
FREQ 58, RES 59
FREQ 59, RES 60

Depending on whether FREQ == 50 and RES == LO when these checks are made, H (thus DE) is raised one cycle later (58-61).

Dio has documented that depending on wakestate, there's a lag from raised DE to MMU raising LOAD of 3-6 cycles. The MMU is not affected by the GLUE 0-3 offset, which means it looks like this:

MMU detects GLUE DE at cycle 62* and raises LOAD at cycle 64

64-58 = 6 = DL6
64-59 = 5 = DL5
64-60 = 4 = DL4
64-61 = 3 = DL3

If we for a second shift our attention to visible pixels on screen, we must remember that GLUE decides (through HSYNC) where the screen is physically placed by the monitor. If GLUE is "late" 3 cycles, as in the DL3 example above, the distance between the screen start and the LOADed pixels displayed by the Shifter will be shorter. We should thus see the screen being shifted one pixel per DL-state above - which is exactly what Dio has documented. DL3 leftmost, DL6 rightmost.

Alright, back to GLUE being offset 0-3 cycles compared to the CPU. There's no way for us to test or detect this from software, we simply don't have one-cycle resolution on the ST. We do however have two-cycle resolution thanks to the use of EXG before MOVE instructions. If we try to change the values of FREQ and RES with as much detail as possible, for GLUE to pick it up, it will look like this:

Changes made by CPU at FREQ 56, RES 56 - read by GLUE at FREQ 56, RES 57
Changes made by CPU at FREQ 56, RES 58 - read by GLUE at FREQ 57, RES 58
Changes made by CPU at FREQ 58, RES 58 - read by GLUE at FREQ 58, RES 59
Changes made by CPU at FREQ 58, RES 60 - read by GLUE at FREQ 59, RES 60

The above is not guesswork. Those are the exact values as documented by Paolo and I have posted them before, but like this:

WS1 (DL6): screen DE is FREQ 56, RES 56
WS3 (DL5): screen DE is FREQ 56, RES 58
WS4 (DL4): screen DE is FREQ 58, RES 58
WS2 (DL3): screen DE is FREQ 58, RES 60

(Verify with the Excel sheet - Default RES values are for WS3/4. In WS1 all RES state checks happen 2 cycles earlier, in WS2 2 cycles later. Default FREQ values are for WS1/3, in WS2/4 all FREQ state checks happen 2 cycles later. The above is the result for a specific position)

Conclusion: The known detectable wakestates are the result of GLUE being offset 0-3 cycles compared to the CPU - which fits with theory (unsynchronized initialisation at same clock) as well as observation (DE-to-LOAD, visible pixel position on screen) and empirical testing of when changes to FREQ and RES have to be made for GLUE to detect them.

Comments welcome

(There might be some uncertainty as to exactly at which cycle which signal is raised and detected but I don't believe it will change the conclusion)

/Troed

PS: All other known wakestates - Spectrum 512 dots, banded/non-banded etc are most likely in Shifter - which with its 32Mhz clock that it then divides further cause similar possibilities. It will also be "lagged" due to receiving DE directly from GLUE with the 0-3 cycle possible offset yet receive LOAD and data from MMU at a fixed cycle (as fix as it can be due to the different clocks, 16 and 32). That might be causal in why WS1/DL6 isn't possible to make unstable for 4-pixel scrolling - I'll get back to that when I've "finished" the Shifter state machine. edit: Oh, and yes, this should explain why there's a WS3 that sometimes behave as WS1 when it comes to unstabilization - the mapping between GLUE wakestates and Shifter isn't 1:1 although it seems to be highly influenced.

*) Why it doesn't detect DE at 58 and 60 when raised that early as well? I don't know.

troed · Post by **troed** » Mon Sep 30, 2013 11:36 am

deleted: double post

Steven Seagal · Post by **Steven Seagal** » Mon Sep 30, 2013 7:03 pm

troed wrote: Comments welcome

Well, it demands time to digest, but right now I have a question.
Is this compatible with the simpler WU view I posted before, that is every 4 cycles, starting from linecycle 0, 2 are for the CPU, 2 for the MMU, and the order determines WU1 (CPU/MMU) or 2 (MMU/CPU)?

Steven Seagal · Post by **Steven Seagal** » Mon Sep 30, 2013 7:15 pm

MasterOfGizmo wrote:Please forgive me for capturing your great thread. Does one of you have a pointer to a documentation of the exact video modes of an ST (not talking about any scrolling or overscan tricks). I am searching for the very basic timing parameters to re-create it in verilog.

If you can/want to help out please reply here http://www.atari-forum.com/viewtopic.php?f=101&t=25522

I promise not to capture this thread again ...

I think Dio answered in your thread and he's more a hardware expert. Have you checked his trace graphs?
What could help is looking at the SM124 & SC1224 service manuals in pdf, they may include timing charts.
I'm preparing some doc myself but it's not ready yet.

Below is copy/paste of an Excel sheet, it's very bad format, a shame, I'm not making fun, just to show you I can't help much more yet, but some of the info I found in the (incomplete too) pdf files.
(PAL "emulator" STF)

Code: Select all

Frequency		50	60	72
shift mode				2
cycles/scanline		512	508	224
# scanlines		313	263	500
top		64	34	34
DE		200	200	400
bottom		49	29	66
cycles/frame		160256	133604	112000
cycles/sec		8021248	8021248	8021248
refresh rate (Hz)		50.05271565	60.03748391	71.61828571
#HBL/sec		15666.5	15789.85827	35809.14286


DE cycles		320	320	160
Bytes fetched		160	160	80
Start of DE cycle		56	52	4
left off bonus		26	24	0
First pixel cycle		84	80	32
End of DE cycle		376	372	164
HBL cycle		464	460	180
cycles to HBL		88	88	16
right off bonus		44	44	0
cycles to new line		48	48	44
right off bonus		24	24	0
		512	508	224


		8000000	8000000	8000000




				0.000028


				160

				28




Monitor				
Hor freq (KHz)				35.7
Vert freq (Hz)				71.2
horizontal retrace time (µs)				6.3
vertical retrace time (µs)				420


line (µs)		64	63.5	28
hsync (µs)			5	3
hblank			10.5	3

left			6	3.5
de			40	20
right			7	1.5
			53	25

			63.5	28

vblank ms 			1.396
vsync			0.19


all			16.69

top			1.184
de			12.7
bottom			1.41

Steven Seagal · Post by **Steven Seagal** » Mon Sep 30, 2013 7:29 pm

troed wrote: Can the STE hardware scroll in high res as well? If so, it needs 16 cycles (?) earlier pre-fetch compared to ST for high res as well - which should mean that the left border switch is earlier (which is true - "late" ST left border code, like ours in SYNC - doesn't work on STE) and DE is earlier. 8 - 16 = 504 so it sort of matches up, although it doesn't explain why a left border switch at 0 works. I wonder if this means that the STE HSYNC pulse is shorter. Also, maybe the prefetch is only 4 cycles for high res - I'm just speculating here and haven't studied it enough. STE is "after my time", you know

http://atari-ste.anvil-soft.com/html/devdocu2.htm

Coming back to this...
HSCROLL works in med res, this demo Cool STE proves it.
When I "fixed" it (meaning fixing Steem), I noticed that HSCROLL was in pixels, not in cycles, and this is what the STE programmer would expect. So, one raster. Probably the same for HIRES.
This would mean, 16 cycles scroll/prefetch in low res, 8 in med res and only 4 in HIRES.
What's nice about it is that it explains the STE limit for "left off": 6-4 = 2 (a new revelation!)
Someone could test this with +2 lines in med res. eg if S2 at cycle 44 works.

npomarede · Post by **npomarede** » Mon Sep 30, 2013 8:06 pm

Steven Seagal wrote: This would mean, 16 cycles scroll/prefetch in low res, 8 in med res and only 4 in HIRES.
What's nice about it is that it explains the STE limit for "left off": 6-4 = 2 (a new revelation!)
Someone could test this with +2 lines in med res. eg if S2 at cycle 44 works.

I don't have an STE to test this, but I don't think scrolling prefetch is different from normal STF shifter's fetching where the shifter always read 2 bytes every 4 cycles, whatever the resolution is. As scrolling prefetch data need to be sent to "normal" internal shifter registers used for each planes, you need to prefetch on the same speed as the shifter normally fetches.
This could be tested on real STE by reading ff8209 during border time (starting at cycle 0 of a line) then doing something like this to get a rough idea of where the address get incremented and by how many bytes at a time :

Code: Select all

; get in sync with ff8209 and wait for cycle 0 on next line
; a0=$ff8209    a1=small buffer for at least 8 bytes
rept 8
move.b (a0),(a1)+
endr

This only has 12 cycles precision, by adding 1 or 2 nop before, it's possible to get down to a 4 cycles precision to see where A0 changed.
Running this piece of code with ff8260=0, then 1, then 2 and comparing the collected values will show if prefetch is done at a different rate depending on the resolution.

Nicolas

troed · Post by **troed** » Mon Sep 30, 2013 8:56 pm

Steven Seagal wrote: Well, it demands time to digest, but right now I have a question.
Is this compatible with the simpler WU view I posted before, that is every 4 cycles, starting from linecycle 0, 2 are for the CPU, 2 for the MMU, and the order determines WU1 (CPU/MMU) or 2 (MMU/CPU)?

Well, as long at it means the GLUE will become offset by 2 cycles as seen by the CPU you will go from DL6/DL5 to DL4/3 (WU1 and WU2 as collectively known). I'm less sure about other side effects by CPU/MMU reordering (and what effect it would have on my hypothesis if it happens in real hardware. Maybe explaining why some machines are more inclined towards one wakeup than "the" other?).

I'm going to try to shoot holes in my hypothesis for a while now and invite everyone else to as well

If it survives then it might turn out to be a viable model.

(I have also given the Shifter state machine some initial work and so far I'm very confused. Maybe it's just me, but if I were to write some synclines right now I would not be using stabilizers in the same way as I have so far .. and that's not just because at least the hi/lo ones delay BLANK and risk messing up HSYNC if there's pixel data there

)

troed · Post by **troed** » Tue Oct 01, 2013 3:20 pm

troed wrote: (I've spent the weekend replacing the DMA chip in my STE and will hopefully find time to play with +20 left border on real hardware soon.

Now done. A similar description to Paolo's below:

Code: Select all

500     HSYNC end
502     left border first possible move to HI     
504
506
508         
510
0        left border last possible move to HI
2        first possible switch back for 20 byte border
4
6       first possible switch back for 26 byte border

Remember, so far everything points to STE behaving as ST wakestate 1 - that is - 0 cycle offset between CPU and GLUE. This is fully logical due to MMU and GLUE having been merged into one in the STE. So, when comparing the above to Paolo's Excel-sheet all ff8260 timings in there should be subtracted by -2 to end up with wakestate 1 values.

Findings:

1) HSYNC end is indeed 4 cycles earlier on STE than ST. I will continue investigating whether this is due to a shorter pulse or if it's a change in start timing as well.

2) "IF(RES == HI) H = TRUE" is also 4 cycles earlier on STE. This is likely due to pre-fetching, I will understand it better when I'm further into my work on the Shifter state machine

3) There's a check at cycle 4 that decides between classic 26 (if HI) byte border or 20 (if LO). I'm unsure exactly what that check is and what consequence it has - although I could speculate that it causes the Shifter (which has a copy of 8260) to throw away what had been pre-fetched so far.

4) Not related, but when testing HSYNC end (known as "No Line 2" in Paolo's excel sheet and one of several "0-byte lines" by emulators) I verified that it disrupts sync. It might be interesting to at least comment in emulator trace output - demos using that line will be discolored on my TV.

I will update my GLUE (and MMU) state machine with both the wakestate hypothesis and separate into ST and STE later. I'm thinking the forum wiki and then comment on updates here.

/Troed

troed · Post by **troed** » Tue Oct 01, 2013 10:34 pm

troed wrote: 3) There's a check at cycle 4 that decides between classic 26 (if HI) byte border or 20 (if LO). I'm unsure exactly what that check is and what consequence it has - although I could speculate that it causes the Shifter (which has a copy of 8260) to throw away what had been pre-fetched so far.

This theory gains support the more I test different things on STE to see where it differs from ST (more than I thought, btw). I've just created 164 and 166 byte lines as well

(Which is of little interest on STE of course, but for the novelty)

/Troed

Dio · Post by **Dio** » Wed Oct 02, 2013 10:12 pm

troed wrote:Remember, so far everything points to STE behaving as ST wakestate 1 - that is - 0 cycle offset between CPU and GLUE. This is fully logical due to MMU and GLUE having been merged into one in the STE. So, when comparing the above to Paolo's Excel-sheet all ff8260 timings in there should be subtracted by -2 to end up with wakestate 1 values.

One important thing to note when considering this is that on the STE the sync register is in the video domain, not the CPU domain. It's impossible to do odd phase accesses to it.

Another thing to consider is that there may be a completely different wakestate in the STE, an actual DL2. My theory is that on the standard ST the clock cascade between the 8MHz and 16MHz clock and the propagation delay in the Glue causes it to just miss the accept window for DE and that adds 4 cycles to the latency. However, it's possible that in the STE the propagation delays are lower and so it can actually achieve DL2.

LOAD and DE are both available on the board (they're not internal signals, since they have to travel from the MCU to the GST Shifter, but I haven't got an STE in bits handy. (Plus the STE is a much less interesting candidate for examination to me given that it's importance in the scheme of things is pretty low).

troed · Post by **troed** » Thu Oct 03, 2013 10:39 am

All: I've written up the STE GLUE state machine now, as well as having started on the Wiki article. It's not linked to from anywhere yet since I consider it to be in a draft state - specifically the section on the MMU should probably be worded differently (it's likely not a state machine) and its timing is probably 2 cycles off.

Remember, it's a wiki. Feel free to edit as you see fit - I do not consider myself to "own" it just because I wrote it and updated Alien's old state machines

http://atari-forum.com/wiki/index.php?t ... _Scanlines

Dio wrote:One important thing to note when considering this is that on the STE the sync register is in the video domain, not the CPU domain. It's impossible to do odd phase accesses to it.

(I might misunderstand you, if so I'm sorry. I'm also writing for the much bigger general audience whom I've heard read this thread

)

I couldn't wrap my head around the empirical description, which obviously was correct, of how the 000-byte line was produced for quite some time. The problem turned out to be that I was stuck (as I know others have been) in thinking ff802a was "Shifter" and ff8260 was "GLUE". I saw no way for the Shifter to be able to take a decision (60/50Hz switch) at cycle 56 in WS1 which could cause a 000-byte line, yet the GLUE had the same possibility at the same cycle with it's HI/LO switch. Two separate circuits can't communicate in 0 cycles, at least that was what confounded me.

The solution was quite obvious (and I believe I've seen Ijor comment upon it already in 2006 here on the forum) but it took until I found the following article for me to realise what was wrong: http://info-coach.fr/atari/hardware/STE-HW.php

The configuration register for synchro (50/60 Hz and synchro int. /ext.) is in GST MCU (and GLUE of the STF) and not in the Shifter, as the localization of the address of this register (see figure 10) could let it think . It is for this register that the GLUE of the STF is provided with two data pins connected to bits 8 and 9 of the bus, which corresponds well to the 2 modifiable bits of the register. But if that explains how the GLUE knows if it must send 50 or 60 Hz synchros, how does it knows that it must send 71 Hz (high-resolution)?
In fact, if you look at the video addresses (figure 10), you will notice that another register is also on bits 8 and 9: VIDEOMOD which makes it possible to indicate the resolution to the shifter, but also to the GLUE (GST MCU on STE). Thus at address FF8260 correspond two registers one in the shifter and one in the GLUE. The only video registers which are in the shifter of the STF are the COLORS and VIDEOMOD.

Once I realised that both changes to FREQ (synchro) and RES (videomode) only need to be detected in GLUE the rest of my wakestate-hypothesis fell nicely into place.

This also means, to get back to your comment, that as far as I understand there's no video domain for either of these switches in the STE either. It's possible that changes to RES are detected slightly different by the Shifter, but they don't seem to be involved in sync lines.

Another thing to consider is that there may be a completely different wakestate in the STE, an actual DL2. My theory is that on the standard ST the clock cascade between the 8MHz and 16MHz clock and the propagation delay in the Glue causes it to just miss the accept window for DE and that adds 4 cycles to the latency. However, it's possible that in the STE the propagation delays are lower and so it can actually achieve DL2.

Yes, I agree. After having gone through the timings in the STE it seems all HSYNC and BLANK signals are two cycles earlier compared to the ST (in WS1), but the cycles at which RES and FREQ are checked and DE raised aren't. This should mean that an STE screen is physically offset two pixels to the right of (the already rightmost) WS1/DL6 on an ST.

If this _isn't_ the case I would assume there's a two cycle (at least) less lag internally from "MMU" picking up "GLUE DE" in the MCU

Which sounds very likely.

LOAD and DE are both available on the board (they're not internal signals, since they have to travel from the MCU to the GST Shifter, but I haven't got an STE in bits handy. (Plus the STE is a much less interesting candidate for examination to me given that it's importance in the scheme of things is pretty low).

From the article above I deduce that his DCYC (on the MCU) is LOAD on the Shifter, and it also receives DE and BLANK (btw, it seems BLANK on STE is on the same voltage as black/no color instead of lower as on the ST?). It's also the MMU/MCU that decides when to increment video counters (05/07/09).

This becomes quite interesting when considering the pre-fetch. I have a working hypothesis as to how the 20 byte left border (as well as +4 and +6 that I found, which should be able to create 162(a new one),164,166,208 and 210 byte lines*) works - but it involves the MMU LOADing up the Shifter _without_ increasing the video address counters - that decision is taken after the 16 cycles depending on the other hardware scroll registers. That logic is clearly more complex in the STE compared to the STF, if so.

(And if it preloads for 16 cycles in high res as well then it actually preloads for 8 cycles before even knowing if there's a screen to start displaying. THAT would look strange in a trace .. )

edit: Maybe I should point out that I assume horisontal scroll is always 16 pixels in all resolution and thus pre-fetch should need one word (4 cycles) in high res, two words (8) in medium and four (16) in low. That works out well with the timings as well, so, no strange traces. It was hypothetical

/Troed

*) I don't see a need to add these to emulators at the moment

Unless someone writes a demo for the sole purpose of not being emulated correctly ...

Dio · Post by **Dio** » Thu Oct 03, 2013 1:21 pm

troed wrote:Once I realised that both changes to FREQ (synchro) and RES (videomode) only need to be detected in GLUE the rest of my wakestate-hypothesis fell nicely into place.

This also means, to get back to your comment, that as far as I understand there's no video domain for either of these switches in the STE either. It's possible that changes to RES are detected slightly different by the Shifter, but they don't seem to be involved in sync lines.

Ah, hang on. I've just checked the timing tester (where I observed this behaviour) and it's not the sync register, but the screencurrent register. So let me run through the whole thing from scratch:

On both STE and STFM writes to the resolution register are always delayed until a video phase, since they need to be transferred through the bus gateway onto the DRAM/Shifter bus which is only available on video phases.

On the STFM writes to the sync register go only to the Glue, and can happen on either CPU phases or video phases. Similarly, reads from screencurrent go only to MMU and can happen on either phase.

On the STE reads from screencurrent can only happen on the video phases. I observed this when I was writing the instruction timing tester. It's easy to write a program that detects this behaviour.

So I don't know if the sync register is in the CPU or the video domain on the STE.

ljbk · Post by **ljbk** » Thu Oct 03, 2013 4:28 pm

Hello !

I just want to clarify one thing: the "noline1" and "noline2" i refer in my Excell sheet do not correspond to "clean" 0 byte lines.
In fact, it seems that the normal "rest" of the screen is moved down by one line and the number of bytes read by the MMU will be the same at the end of the VBL.
Although i did not test such extreme cases, i doubt that the "noline" effects, like the 14 bytes line, can be repeated for many lines without causing some serious image problems.
I also want to point out that mixing 508 cycles lines (NTSC ones) with 512 cycles lines (PAL ones) even if there is only 1 different line might lead to image bending with some TV screens. The same occurs if you miss the correct spot to go back to 50 Hz when doing a +2 bytes lines.

@Troed
In your wiki document you can refer the other way to get a 160 bytes line: +2 at start and -2 at end.

Paulo.

troed · Post by **troed** » Thu Oct 03, 2013 6:17 pm

Dio wrote:On both STE and STFM writes to the resolution register are always delayed until a video phase, since they need to be transferred through the bus gateway onto the DRAM/Shifter bus which is only available on video phases.

Alright, I'm doing my best to follow here

IIRC every other cycle is a video phase? So, this will have the same consequence as far as quantization goes as the limit already being set by the instructions available (MOVE, EXG+MOVE)?

On the STFM writes to the sync register go only to the Glue, and can happen on either CPU phases or video phases.

I'll take your word for it

It's beyond my knowledge - to me it's "the same bus".

So I don't know if the sync register is in the CPU or the video domain on the STE.

If I understand it correctly what will happen is a two cycle quantization and then it's "only" a matter of which of the cycles the write to the register happens on?

ljbk wrote:I just want to clarify one thing: the "noline1" and "noline2" i refer in my Excell sheet do not correspond to "clean" 0 byte lines. In fact, it seems that the normal "rest" of the screen is moved down by one line and the number of bytes read by the MMU will be the same at the end of the VBL.

Yes, as far as I can see a cancelled HSYNC (either by extending the blank just before it or by making sure it doesn't trigger) will as a side effect not increase the HBL counter. Since it will be seen as a single line by the monitor/TV taking twice the normal amount of cycles it's as you wrote most likely have almost immediate distorting effects.

@Troed
In your wiki document you can refer the other way to get a 160 bytes line: +2 at start and -2 at end.

Sure, I'll add it as an example of how to count the amount of bytes for a line. But I'll repeat my wish for everyone to edit it as you see fit - it's the same login for the wiki as for the forum. I don't really want it to be seen as mine

(Actually, I should link to the post where you attached the synclin0 program as well!)

Btw, it's my hope that when my understanding of the Shifter state machine increases it might be possible for us to figure out what causes your 4 pixel normal screen sync scroller to sometimes fail (WS1, and in some WS3 variants). And as a pipe dream - maybe even a solution.

You made a post earlier in this thread;

A line 158 in WS2 can do the following as far as i remember without looking at my notes:
- line started 0 shifted => next line starts 0 shifted;
- line started -4 shifted => next line starts 0 shifted;
- line started -8 shifted => next line starts -8 shifted;
- line started -12 shifted => next line starts -12 shifted;
A line 162 in WS2 can do the following as far as i remember without looking at my notes:
- line started 0 shifted => next line starts -4 shifted;
- line started -4 shifted => next line starts -4 shifted;
- line started -8 shifted => next line starts -8 shifted;
- line started -12 shifted => next line starts -12 shifted;

Which is exactly the behaviour I'm hoping to be able to replicate in pseudo code. Am I correct in that you've during the years you've worked on the 4 pixel scroller extensively documented this behaviour?

Like with your Excel sheet - it's a treasure trove of information.

/Troed

troed · Post by **troed** » Thu Oct 03, 2013 9:12 pm

ljbk wrote:Again for emulator programmers, i wish to add that the "kind-of-C" pseudo code has one flaw.
It might never happen but one has to consider the following case:

cycle 360 : FFFF8260: $00 -> $02
cycle 368 : FFFF820A: $02 -> $00
cycle 376 : FFFF820A: $00 -> $02
cycle 384 : FFFF8260: $02 -> $00

The change at cycle 368 should cause a line 158 type ending (1 word less going to the Shifter).
But in fact we get a Right Border openning due to the FFFF8260 change to $02.
So aparently, any changes to FFFF820A, will only be considered by the GLUE at the moment when FFFF8260 comes back to 0 or 1.The same kind of timing overlaps may occur at the start of the screen for the 0 bytes line case.

While going through old posts in this thread for attribution at the wiki page I came across the above - and since it directly applies to state machine pseudo code that I've never been satisfied with I thought I would give it a go

Code: Select all

52	IF(FREQ == 60) H = TRUE
56	IF(FREQ == 50) && (RES == LO) H = TRUE

That just looks illogical. Those checks ought to be the same in 60Hz and 50Hz, that is, both should check RES as well. To test this we have to be able to be in 60Hz and switch to HI at 372 and be back at LO already at 376 - which luckily we can do. Thanks to NoCrew's 4 pixels raster-trick (ADD/MOVE*) it's indeed possible in special cases to switch back'n'forth in four cycles. It's not really useful for sync tricks on the STF due to wakeup modes, but I did my test on STE and have no reason to believe the results aren't true for ST as well.

Results:

hi/lo (50Hz)
372/376 - 160 byte line
376/380 - 204 byte line

hi/lo (60Hz)
372/376 - 204 byte line
376/380 - 158 byte line

The above leads me to conclude that the tests at 52, 56, 372 and 376 on ST (36, 40, 372 and 376 on STE) should look like this instead:

Code: Select all

52	IF(FREQ == 60) && (RES == LO) H = TRUE
56	IF(FREQ == 50) && (RES == LO) H = TRUE
...
372	IF(FREQ == 60) && (RES == LO) H = FALSE
376	IF(FREQ == 50) && (RES == LO) H = FALSE

You learn something new every day

I've been itching to change this just because it made more sense but it feels a lot better having actually verified it.

/Troed

*) move.b d1,(a0) / move.b d0,(a0) writes the values at the beginning of the instruction, and the instruction takes 8 cycles. The switch is thus 8 cycles long and would cover both tests above. Using add.b d1,(a0) / move.b d0,(a0) will write the value 8 cycles into the 12 cycles long add-instruction, 4 cycles before the second move is done. The switch is now only 4 cycles long, although the total time is increased to 12+8.

Dio · Post by **Dio** » Thu Oct 03, 2013 10:24 pm

troed wrote:I'll take your word for it It's beyond my knowledge - to me it's "the same bus".

There's two data buses in the ST, the CPU bus and the DRAM / Shifter bus. The two are bridged by a bus gateway - a buffer and a latch, controlled by RDAT, WDAT and LATCH on the MMU.

The DRAM bus is segmented by MMU into two phases each of two 8MHz cycles, one of which is a CPU phase and the other of which is a video / refresh (and sound on the STE) phase. If the CPU tries to access DRAM or the Shifter on a video phase, DTACK is withheld for two clock cycles to insert a pair of wait states and align the access onto the CPU phase. But anything only on the CPU bus (including Glue and the MMU itself, at least in the STFM) can be accessed on any phase.

troed · Post by **troed** » Fri Oct 04, 2013 7:00 pm

Tonight I wanted to pin down the exact location where 512 resp. 508 cycles is decided on the STE. And I came away with some very strange results that I want to share - speculate away

First, in the STF we can see Paulo having documented cycle 54 (WS1) as being a position related to sync. This is right in between the starting points of NTSC and PAL lines, and so has been the reason why +2 and 0-byte lines where tricky to create. Depending on whether FREQ is 60/50 the line is 508 or 512 cycles - and if RES is HI sync pulses become very strange indeed.

On the STE, the starting points for NTSC and PAL (due to pre-loading for hardware scroll) are 16 cycles earlier, so I wanted to verify if the 508/512 decision had moved as well. The results have me confused:

56 Line changes depending on FREQ 50 or 60, as well as whether RES is LO or HI (similar to ST cycle 54)
58 Line changes depending on FREQ 50 or 60, RES irrelevant

Now, since I'm testing with an LCD TV it's very difficult to know what happens since they tend to try to smoothen out any incoming signal issues - but when I did 40 lines with a 60/50 switch at each of the above positions with colour-markers to see if they line up this is what I got:

56: line 1 and 2 slightly offset then a huge jump to line 3,4 and then slightly back and all other vertically aligned
58: first 1 and 2 slightly offset - then all others vertically straight.

Thoughts? I tried changing from 128 to 127 nop lines in the code since I thought that maybe the decision whether the line is 508 or 512 cycles is separate (!) from the actual pixel clock setting (60Hz/50Hz) - but can't say I succeeded.

Dio wrote:If the CPU tries to access DRAM or the Shifter on a video phase, DTACK is withheld for two clock cycles to insert a pair of wait states and align the access onto the CPU phase. But anything only on the CPU bus (including Glue and the MMU itself, at least in the STFM) can be accessed on any phase.

Thanks for the explanation. Does this mean the MMU consults a memory map to know whether it needs to do this for a specific memory address or not?

ljbk · Post by **ljbk** » Fri Oct 04, 2013 8:29 pm

Hi !

Just to add my 2 cents to the STE case, i would like to point out the following.
In the 2006 thread, refered previously, as i don't have an STE, i asked to the forum if someone could test a few test programs in order to have the 1+3 lines syncscroll to work on STE as well. Based on the responses, i made a STE patch case.
If you look at the source available at DHS, you will see that there is only one STE patch related to line sizes. It is related to the 0 bytes case: 60Hz has to be set at emulator cycle 40 and we go back to 50Hz at emulator cycle 52.
So i assume that the +2 case works in the same way for STE as for STF for the detected (old)wakeup state on STE by that program (probably 1).

I will try to find that precise spot in the 2006 thread and if i do find it, i will update this post.

Paulo.

Edit:
Here is the spot: http://www.atari-forum.com/viewtopic.ph ... &start=125
MiggyMog reported his results on a real STE.

Dio · Post by **Dio** » Fri Oct 04, 2013 9:43 pm

troed wrote:Thanks for the explanation. Does this mean the MMU consults a memory map to know whether it needs to do this for a specific memory address or not?

My expectation is that MMU owns DTACK for everything going through the gateway and the timing of it is run by the same state machine that manages RDAT, WDAT, LATCH, LOAD etc.

So it's Glue's decoding logic that indicates (I think on the /RAM and/or /DEV signals) to MMU that it's MMU's responsibility to handle the operation and MMU takes it from there.

troed · Post by **troed** » Fri Oct 04, 2013 10:10 pm

ljbk wrote:If you look at the source available at DHS, you will see that there is only one STE patch related to line sizes. It is related to the 0 bytes case: 60Hz has to be set at emulator cycle 40 and we go back to 50Hz at emulator cycle 52.
So i assume that the +2 case works in the same way for STE as for STF for the detected (old)wakeup state on STE by that program (probably 1).

Indeed - if we look at the NTSC and PAL starting positions in the state machines:

STF:
52 IF(FREQ == 60) && (RES == LO) H = TRUE
54 IF(FREQ == 50) LINE = PAL
56 IF(FREQ == 50) && (RES == LO) H = TRUE

STE:
36 IF(FREQ == 60) && (RES == LO) H = TRUE
40 IF(FREQ == 50) && (RES == LO) H = TRUE
56 IF(FREQ == 50) LINE = PAL
58 Also related to line length similar to above for 50/60Hz. Unknown cause.

We see that it's very similar - but due to the STEs hardware scroll capability it needs to pre-load 16 pixels so the signal to MMU is raised 16 cycles earlier and the complexities in two cycle timing with the LINE-length position in between is gone. What I wanted to figure out tonight was the STE equivalent to STF cycle 54, that is, the position where the length in cycles (508 or 512, and hi res timings as well) was taken.

I did not expect the answer to be two positions with slightly different behaviour

(Btw, contrary to what I have believed and posted before STE +2 can be created not only with 36/xx but also with 34/xx, 32/xx and 30/xx ... as the state machine indicates

The only thing that's important is to be at 60Hz at cycle 36 and back at 50Hz at cycle 56)

Dio wrote:So it's Glue's decoding logic that indicates (I think on the /RAM and/or /DEV signals) to MMU that it's MMU's responsibility to handle the operation and MMU takes it from there.

Thanks. This logic has always fascinated me but I never took the time to learn it. Appreciated.

Steven Seagal · Post by **Steven Seagal** » Sun Oct 06, 2013 11:49 am

troed wrote: (Btw, contrary to what I have believed and posted before STE +2 can be created not only with 36/xx but also with 34/xx, 32/xx and 30/xx ... as the state machine indicates The only thing that's important is to be at 60Hz at cycle 36 and back at 50Hz at cycle 56)

See, you got confused by those STE lines +2 too.
Before latest fix, Steem would also only take a change at cycle 36. It seems to work because the rare cases (Forest STE and ?) use this timing.

Steven Seagal · Post by **Steven Seagal** » Sun Oct 06, 2013 12:01 pm

troed wrote: (And if it preloads for 16 cycles in high res as well then it actually preloads for 8 cycles before even knowing if there's a screen to start displaying. THAT would look strange in a trace .. )

edit: Maybe I should point out that I assume horisontal scroll is always 16 pixels in all resolution and thus pre-fetch should need one word (4 cycles) in high res, two words (8) in medium and four (16) in low. That works out well with the timings as well, so, no strange traces. It was hypothetical

/Troed

I'm sure it will come down to this because it's the best explanation for the timing difference in STE's left border off trick. Like you say 16 cycles off would make strange traces.
It's not prefetch but the scrolling itself after prefetch that takes #cycles needed for one raster.
Prefetch is always 16 cycles, but then there's scrolling - of one raster, then the still strange latency then rendering.

troed · Post by **troed** » Sun Oct 06, 2013 10:35 pm

Steven Seagal wrote:See, you got confused by those STE lines +2 too.
Before latest fix, Steem would also only take a change at cycle 36. It seems to work because the rare cases (Forest STE and ?) use this timing.

I guess you could say that

It wasn't until I started documenting the state machines that I realized that there seemed to be no reason for STE to do anything between the "just blank" (no pixels displayed) lines that end at FREQ 28* and cycle 36 for +2 as documented in the 2006 thread. Then it struck me that the tests MiggyMog did for Paulo began at cycle 36. So, cycle 32 (or 34,30) were never tested at all, see http://www.atari-forum.com/viewtopic.ph ... 7&start=81

(So I did, a few days ago. I will continue testing the last few remains of STE behaviour that might differ from the extensive tests Paulo gave us with the excel sheet)

Steven Seagal wrote:I'm sure it will come down to this because it's the best explanation for the timing difference in STE's left border off trick. Like you say 16 cycles off would make strange traces.
It's not prefetch but the scrolling itself after prefetch that takes #cycles needed for one raster.
Prefetch is always 16 cycles, but then there's scrolling - of one raster, then the still strange latency then rendering.

Here's my current thinking on pre-fetch and why it creates +20 (left border) as well as +4/+6 (regular lines). I don't think it's because of GLUE and I don't think it's the Shifter either at the moment. MMU is the one that decides the values of the current screen address and seems to be the only one where the timing and signals fit.

As always - comments are very welcome. All hypotheses need to be tested and if they survive they become gospel

On STE the check to see if we should start a hires (= left border) screen is done 4 cycles earlier than on ST. This matches the 16 cycle difference for lores.

0 IF(RES == HI) H = TRUE

The reason for this is pre-fetch, needed to be able to hardware scroll. It's always done, but not used unless other STE-specific registers are set. Thus the MMU receives this information and starts LOADing up the Shifter. One word per 4 cycles. For hires, one word is all that's needed for 16 pixels - in lores four words are needed (16 cycles, 8 bytes).

40 IF(FREQ == 50) && (RES == LO) H = TRUE

Let's start at lores. At cycle 40 the pre-fetch starts. 4 cycles later a word has been LOADed into the Shifter - but the actual screen memory adress is not touched at this time. The MMU simply does not know if this will be used. If a switch to hires is made at cycle 44 however, the MMU finds itself in hires pre-fetch which is only 4 cycles long so it's already done. It thus updates the screen memory adress with 6 bytes read - 8 minus the word already done.

The same happens at cycle 48 - two words have been LOADed into the Shifter which is well beyond a hires pre-fetch and when the resolution switch is made the screen memory adress is updated with 4 bytes read - 8 minus the two words already done.

(I tried a short switch, avoiding 56, at cycle 52 but did not get +2 as I had hoped for)

At cycle 4, if we go back to lores, the MMU finds at that there are still 12 cycles left to pre-fetch for lores. It's thus not until cycle 16 the real signal for the Shifter to start displaying data is sent, as well as MMU updating the video counter registers: (56-16)/2 = 20.

56 IF(FREQ == 50) LINE = PAL

Sounds like a reasonable hypothesis? It comes with a caveat: It requires the MMU to know about resolution, something it obviously can do being merged with GLUE in STE but it's not known behaviour and it differs from ST. Also, I use CPU cycle timing even though there's a delay between it and MMU reacting. Not sure how that deals with the overshoot-20 byte thing.

I'll update the wiki with this hypothesis if it survives a few day without obvious holes.

/Troed

*) I'm getting nowhere just pondering what it is that causes these so when I have the STE values for FREQ and RES I'll add those and Paulo's ST values to the state machines

Steven Seagal · Post by **Steven Seagal** » Mon Oct 07, 2013 7:15 pm

OK I've looked at part of the "state machine", and it makes sense.
In a later version of Steem, not v3.5.3 because it's risky and demands testing, some parts where we're still looking for specific switches should be replaced with checking the state.

For example for one type of 0-byte line, we have something like this:

Code: Select all

    if(ShiftModeChangeAtCycle(464-4)==2 && !ShiftModeChangeAtCycle(464+8))
      CurrentScanline.Tricks|=TRICK_0BYTE_LINE;

Funcion names are self-explanatory.
Looking for a R2/R0 switch at such timing in a somewhat rigid way. But notice we already hinted at the reason by isolating cycle 464 (the cycle of HSync).

Instead we would have something like that:

Code: Select all

    if(ShiftModeAtCycle(464)==2)
      CurrentScanline.Tricks|=TRICK_0BYTE_LINE;

Which is simpler and more performant, and will work with new cases.
Except that '464' should be computed for: STF/STE, 50Hz/60Hz, WU state...
Other 0-byte lines also make sense: at the cycle of "stop HSync", at the cycle of "stop HBlank", etc.

Steven Seagal · Post by **Steven Seagal** » Thu Oct 10, 2013 7:12 pm

While running some checks I noticed a "line +2" oddity on the STE.

If you switch from 50hz to 60hz at cycle 40, the line isn't +2. (Forest STE tests)
If you switch from 60hz to 50hz at cycle 40, the line isn't +2. (Darkside of the Spoon first screen, it was broken in STE mode, that's how I noticed the problem)

I don't know if that makes sense. In Steem there's a single check for STF, a double for STE.
In STF mode, frequency must be 60hz at cycle 54.
In STE mode, frequency must be 60hz at cycles 38 and 42.

The messy code below works with the cases I know:

Code: Select all

    t=56-16; //The line must "start" earlier on STE due to HSCROLL
#if defined(SS_MMU_WAKE_UP_SHIFTER_TRICKS)
    if(MMU.WakeUpState2())
      t+=2;
#endif

#if defined(SS_STF)
    if(ST_TYPE!=STE)
      t+=16;
    else
#endif
    if(ShiftModeAtCycle(t)==1)
      t+=8; // MED RES, STE starts only 8 cycles earlier

    if(/*CyclesIn>=t+2 &&*/
      FreqAtCycle(t-2)==60 
      && (
#if defined(SS_STF)
          ST_TYPE!=STE
#endif
          || FreqAtCycle(t+2)==60 )
      && (FreqAtCycle(376)==50 || CyclesIn<376 && shifter_freq==50)) //TODO WU?
    {
      CurrentScanline.Tricks|=TRICK_LINE_PLUS_2;
//      TRACE("CyclesIn %d t %d\n",CyclesIn,t);
  //    VideoEvents.ReportLine();
    }

Atari-Forum

horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST

Re: horizontal scrolling on ST