horizontal scrolling on ST

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Post Reply
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

There isn't a convention on these things. Indeed, I'm aware of more computers whose counter origin is the top-left screen fetch than it being tied to the sync positions.

The 14-byte line case may demonstrate there isn't one falling edge of HSYNC though. Plus that then creates the problem with needing to determine the line length at an unusual position.

I think it's probable that the correct zero for the time is the point at which the internal counter's length for a given scanline is decided. This may also be the counter reset point. It's why I've been looking at where VSYNC gets incremented, assuming it happens due to counter reset.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

Dio wrote:So firstly, the DE to shifter LOAD is variable depending on the wakeup state. Hence my notation DL3-DL6 for each wakeup state.
I see, I finally understand this DL3-DL6 notation. :) This would imply that the shifter counter starts running later according to WU.

Then if you look at the traces, you can see where the writes to the registers are happening. Look at the A23 line - those indicate accesses to memory addresses with the top bit set - i.e. hardware registers or ROM. If you can also see R/W low, then it's a write. So if you look at, say, the 158-byte line, you can see a pair of writes either side of the disabling of DE, which must be the two writes to the syncmode register.
This is indeed interesting, apparently DE is very close to those writes, around cycle 372.

Similarly, the 14-byte line shows two HW register writes either side of the small block of DE. So you can use those to correlate emulator time with the traces.

But there's a deeper can of worms here: what is the accurate definition of 'cycle 376'?

You say "cycle 376 sync 0 -> right border off". But that's a huge simplification of what actually happens.
I didn't mean to explain it all but to illustrate again what's understood by "emulator cycles".
There's a relation between those cycles and pixel cycles ("Paulo cycles"):
emulator cycles = Paulo cycles +83
Paulo cycle 1 = emulator cycle 84.
Here's a constant in Steem:

Code: Select all

#define CYCLES_FROM_HBL_TO_LEFT_BORDER_OPEN 84
This means that a palette change at cycle 84 could affect pixel 1 of a normal 50hz scanline.
In the graph I commented, leaving alone WU states, you would have DE activating around cycle 56, the latency, the prefetch, the latency and then the pixel being rendered around cycle 84, I assumed it's when R2 G2 B2 change but I'm not sure.
If we consider the emulator cycle to start at the beginning of that state machine (MC1) then the CPU write hits the Glue in MC1 or MC2 (assuming it's in normal phase and not on the normally unavailable half cycles). The Glue then does the comparison - but not immediately, but at some later point (probably 1-4 cycles later depending on the wakeup state). 1-4 cycles after that the MMU sees DE, and probably 3 cycles after that it issues the first /LOAD.
Emulator cycles start from 0 at very first frame, then 0 after 512 cycles in a 50hz frame.
In Steem, you have this simple schema for WU:

Code: Select all

State 1
+-----+
| CPU |
+-----+
| MMU |
+-----+


State 2
+-----+
| MMU |
+-----+
| CPU |
+-----+
We generally consider WU1 when talking about cycles.
At the moment emulators perform the simplification, and it almost works because in general wakeup states are ignored, everything quantises to a nice 4-cycle boundary, the effects of the two variable delays are inverted and so cancel out, and it 'just works'. But that's not a true simulation of what's going on - it's just a HLE (High Level Emulation) that emulates the effect rather than the signals in any great detail.

In particular, it doesn't lead to any greater insights about the hardware. In order to gain those, it's necessary to unpick all the fiddly little details and properly consider all the wakeup states. Especially when it comes to considering the +2 cases where the write to the Glue happens half-way between two CPU phases.
In current version of Steem, WU isn't generally ignored, but as we don't master this, it's optional.
The 1st goal is to run programs.
In all versions of Steem cycle precision is 2, not 4.

So what I'm trying to answer is the root hardware defining "what is cycle 376'"? What does zero mean in this numbering system? Is the write actually happening on cycle 376, or is it cycle 377? Where does Glue do its comparisons? At what point does MMU react? What is the logic - and latency - in the Shifter?
That's what I try to answer, practically. This is the timing when the write happens.
You identified yourself places where we could attach emulator cycles to the graph, around cycle 372 for line -2.
I already hinted at the time when the shifter counter starts running.
This may lead to the ability to do a genuine low level simulation, rather than just trapping write addresses, looking up a table and seeing what's supposed to happen.
You know I like this approoach as I implemented real emulation of PC offset in bus errors.

With a caveat, performance, though the shifter trick tests themselves can be quite taxing, especially in Steem.
Just compare CPU usage for desktop with Overscan Demos #6.
And of course, it must work.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

troed wrote:Cycle 464: HSYNC start

Matches "No Line 1 end" - I assume a switch here causes no HSYNC to appear at this position (or the whole line?)
... and going back to the 2006 thread where TCB's faulty auto detection was discussed we can verify this with Paolo's investigation into how that did/did not result in additional line lengths depending on whether right border had been opened :P

http://www.atari-forum.com/viewtopic.ph ... &start=114

I think the most interesting tidbit from that is that cancelling out HSYNC will not increase the HBL counter (additional lines drawn at the bottom of the screen). It's obvious it will always distort sync and thus cannot be used to create new line lengths. However, I guess it's possible the HBL-increase could be done at another time and just depends on there having been an HSYNC pulse.

Of note: There's a clear example there of how the video counter is increased separately from the Shifter receiving data from the MMU, if I understand Paolo's description correctly.



With regards to the 14-byte line, it doesn't need to be created with a "long left border" switch. A regular left border opening can be done, followed by another HI/LO switch that covers cycle 32:
troed wrote:Cycle 32 (30 really, but this is where I'm being liberal with a 2 cycle difference): BLANK end

Matches "0 byte line + BLANK" - I assume a switch here causes an extended BLANK over the whole line and no DE.
The result is the same, it still distorts sync. However, I'm guessing the two switch method version wouldn't cause an earlier BLANK end* as is visible in the trace diagram (where it happens on cycle 24, a new HSYNC then follows at cycle 32 which immediately - 4 cycles later - causes BLANK to go off again and at the same time DE is deactivated).

As to _why_ it's possible to get a second HSYNC I think we need to ponder exactly how the hardware is wired up into a state machine. I also thinks it's quite specific to cycle 32. I think this has to do with the hardware being wired to handle both the hi-res and lo-res cases without switching between two modes, i.e, knowing what the pulses look like for a regular hi res scanline would possibly provide new insight.

At this moment I don't think it's possible to create a stable 14 byte line. In my view it's obvious that the second HSYNC is the cause for DE to deactivate. Trying to avoid the second HSYNC would then create a regular 186 byte line. (Yes, I've tried switching to external sync (!) at that position without success).

/Troed

*) Because nothing differs up until cycle 24 compared to an 80 byte line which does not show an early BLANK end in the trace diagram
Last edited by troed on Sat Sep 21, 2013 6:39 pm, edited 1 time in total.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote: Of note: There's a clear example there of how the video counter is increased separately from the Shifter receiving data from the MMU, if I understand Paolo's description correctly.
Normally there's only one counter, so it must point to the correct address before LOAD.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote:
troed wrote: Of note: There's a clear example there of how the video counter is increased separately from the Shifter receiving data from the MMU, if I understand Paolo's description correctly.
Normally there's only one counter, so it must point to the correct address before LOAD.
You're right, the described effect is probably just due to alternating BLANK. It's somewhat interesting that it alternates though, and at the same time it controls whether DE can be modified with normal sync switches.

(Again referencing the 14 byte case where the second HSYNC causes BLANK _and_ DE to change)

I wonder if at least BLANK is not set/cleared but inverted by the state machine. Maybe that's even true for other signals.
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

troed wrote:I wonder if at least BLANK is not set/cleared but inverted by the state machine. Maybe that's even true for other signals.
I reckon this is unlikely. I can't see any sign of it in the traces - plus an S/R flip-flop is two gates while a toggling flip-flop is six.

I agree that it's likely that the state machine clears DE whenever HSYNC is set and that the 14-byte line will probably prove unobtainable without the corrupt HSYNC.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

Coming back to the SNYD/TCB screen...

In the doc by Alien, we have this:
It is therefore sufficient to switch to monochrome to activate H
and DE, and therefore force the MMU and the SHIFTER to start
decoding the useable screen. One returns to low or medium
resolution to actually see the useable screen on the RGB pins.
Thus one obtains at 50Hz lines of 160+26 = 186 bytes. At 60Hz
one obtains lines of 184 bytes.
The difference of 2 bytes
corresponds to the difference of 0.5us between the two line
lengths (63.5us at 60Hz and 64us at 50Hz).
I ran some tests in Steem, and emulation of TCB in WU1 (logo on the left) makes more sense with a left off +24 than having a normal left off +26 followed with a spurious "line -2" trick. Spurious because the frequency before and after left border removal is 60hz.
Edit: spurious in Steem, it is the '372:S0000' that is identified as provoking a line-2.

Code: Select all

Before:
-30 - 368:S0000 452:S0000 476:r0900 496:r0900 512:T0100 512:#0000
-29 - 004:r0900 024:r0900 044:r0900 064:r0900 084:r0908
-28 - 000:R0002 008:R0000 372:S0000 380:S0002 440:R0002 452:R0000 508:T2009 508:#0184
T2009 means "left off +26; line -2"

Now:
-30 - 388:S0000 472:S0000 496:r0900 512:T0100 512:#0000
-29 - 004:r0900 024:r0900 044:r0900 064:r0900 084:r0908
-28 - 000:R0002 008:R0000 372:S0000 380:S0002 440:R0002 452:R0000 508:T22000 508:#0184
T22000 means left off +24 (372:S0000 ignored)
In both cases we have a 184 bytes line, but in the first case this is 160+26-2=184, in the second 160+24=184.

But when I check Paolo's sync line tester, line 184 is done with left off, line -2, so I wonder if the line +24 trick described by Alien is relatively unknown or impractical?
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote: I ran some tests in Steem, and emulation of TCB in WU1 (logo on the left) makes more sense with a left off +24 than having a normal left off +26 followed with a spurious "line -2" trick. Spurious because the frequency before and after left border removal is 60hz.

In both cases we have a 184 bytes line, but in the first case this is 160+26-2=184, in the second 160+24=184.

But when I check Paolo's sync line tester, line 184 is done with left off, line -2, so I wonder if the line +24 trick described by Alien is relatively unknown or impractical?
In 60Hz the line starts 2 bytes earlier and ends 2 bytes earlier compared to 50Hz. So, it's only what the frequency is at the start and end positions that decides when DE will be activated/deactivated (this beside the position that decides whether it's a 508 or 512 cycle line). The line won't be spuriously "-2" - if it's in 60Hz at that position that's where it'll deactivate DE :) Since the start of the left border doesn't differ in 50 and 60Hz you could also see it as the width of the left border being 24 bytes in 60Hz vs 26 bytes in 50, but that's still due to the frequency at the +2 position at the start of the line so it doesn't add any new info and shouldn't change anything in your code if it already detects that.

As for the TCB screen specifically if we're talking about the line that's 4 cycles too early a normal right border switch will indeed become a short right (-2). Right? :P
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

You see this 'right' but there are still some details 'left'...

1. The 'spurious' side is at the time of the test at cycle 372, the 'change' to 60hz is taken (in Steem and I think Hatari) even though the line is already at 60hz. At the end of the line, what matters is that we have 184 bytes though.

2. A question is why left off at 60hz isn't generally used to obtain a line 184?
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote:You see this 'right' but there are still some details 'left'...

1. The 'spurious' side is at the time of the test at cycle 372, the 'change' to 60hz is taken (in Steem and I think Hatari) even though the line is already at 60hz. At the end of the line, what matters is that we have 184 bytes though.

2. A question is why left off at 60hz isn't generally used to obtain a line 184?
(I will probably have time to update the state machines from the previous page including ff820a and ff8260 both being GLUE and with internal delays due to wakestates, as well as the timing from CPU cycles to pixel display, this weekend btw)

Think state machines instead of switches* :D The line will end at 372 if the frequency is 60Hz. It will end at 376 if the frequency is 50Hz. Tested at those specific cycles - it doesn't matter when the switch to 60Hz is made - only what the state is when tested.

The reason for not opening the left border with 71Hz and then switching back to 60Hz until line end to get 184 bytes is because the state machine will check for 50Hz at cycle 54 and if true the line is 512 cycles, if it's 60Hz the line will be 508 cycles. We don't want to mix 512 and 508 cycle lines so at that specific position we must be in 50Hz.

(Cycle values are for WS1/3, in WS2/4 all ff820a state checks happen 2 cycles later)

/Troed

*) I fully realize why for performance reasons emulators look for "switches" but it's conceptually very wrong
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

Looking for switches makes sense while knowledge isn't perfect, eg for all the 0-byte lines mess.
And the 'line +2' rules for STF and STE are strange to say the least.
Even for the left border we're still poking, and soon I'll make some revelations!
Conceptually, saying "left off at 60hz=+24 bytes" is nicer than "left off=+26 bytes but if line ends at cycle 372, -2 bytes".
It didn't start as '+26'.
Or maybe with a state machine bytes would be counted +2 every 4 cycles while DE is on. Nice, but a challenge for emulation and for your regular 160 bytes line, lots of useless computations.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
npomarede
Atari God
Atari God
Posts: 1344
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: horizontal scrolling on ST

Post by npomarede »

Steven Seagal wrote:Looking for switches makes sense while knowledge isn't perfect, eg for all the 0-byte lines mess.
And the 'line +2' rules for STF and STE are strange to say the least.
Even for the left border we're still poking, and soon I'll make some revelations!
Conceptually, saying "left off at 60hz=+24 bytes" is nicer than "left off=+26 bytes but if line ends at cycle 372, -2 bytes".
It didn't start as '+26'.
Or maybe with a state machine bytes would be counted +2 every 4 cycles while DE is on. Nice, but a challenge for emulation and for your regular 160 bytes line, lots of useless computations.
It may sound nicer, but it could be wrong too. Final line's length is not the only parameter to emulate, you must also take into account when DE starts and stop.
For example, a normal 160 byte line at 50 Hz is not centered on screen the same way as a 160 byte line at 60 Hz ; the 60 Hz line starts 4 cycles earlier and as a result it's shifted 4 pixels to the left when compared with the 50 Hz line.
Same thing goes when mixing some left/right border, the fact that some +2/-2 bytes compensate each other doesn't mean the line will be centered the same way on screen.
I had to take this into account into Hatari, because if you mix this with color changes in the background they will not be correctly aligned with the bitmap.

Nicolas
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote:Looking for switches makes sense while knowledge isn't perfect, eg for all the 0-byte lines mess.
And the 'line +2' rules for STF and STE are strange to say the least.
Even for the left border we're still poking, and soon I'll make some revelations!
Conceptually, saying "left off at 60hz=+24 bytes" is nicer than "left off=+26 bytes but if line ends at cycle 372, -2 bytes".
It didn't start as '+26'.
I agree that there could be many reasons for approximating the state machine (as per target hardware, or as close as we can describe it in pseudo-code as I'm trying to do here) differently in emulators. However, when reasoning around how things work (and possibly in implementing "compatibility-modes" in emulators and FPGAs) it's much easier to have the state machine in mind. Your comment about "line +2 rules for STF and STE being strange" is actually a perfect example.

Excerpt from work in progress - I hope to be able to post a full updated version (includes BLANK, HSYNC etc) of where I'm currently at during the weekend:

Code: Select all

6     IF(RES == HI) H = TRUE
STE 36     IF(FREQ == 60) H = TRUE
STE 40     IF(FREQ == 50) H = TRUE
STF 52     IF(FREQ == 60) H = TRUE
54     IF(FREQ == 50) LINE = PAL
STF 56     IF(FREQ == 50) H = TRUE
372     IF(FREQ == 60) H = FALSE
376     IF(FREQ == 50) H = FALSE
The above explains +2 and 0-byte lines for STF and STE. At those specific cycles you check what the current value of ff820a is and set the values for either "DE active"* and/or whether the linetype is 508 cycles (default) or 512 ("LINE = PAL").

The reason why the state machine checks for STE is different (16 cycles earlier) from STF is completely logical. The STE screen always start 16 pixels earlier - due to its hardware scroll capability - and a separate register then controls which pixel starts where. That's the reason for DHS's (right?) trick with being able to always switch those 16 pixels on and gain screen width.

So, as for the 184 byte line in 60Hz (STF):

At cycle 6 RES=HI and thus we start the screen (activate H). We then stay in 60Hz, which means that at cycle 52 nothing happens, the screen is already started. At cycle 54 a check is made to see if FREQ is 50 (it isn't) which means we stay with a 508 cycle line. Nothing happens at 56 since FREQ is 60 so the next thing that happens is at 372 where the check for FREQ 60 is true and H is deactivated.

Compare this with a 184 byte line in 50Hz (STF):

At cycle 6 RES=HI and we start the screen. We go back to 50Hz which means that the check at cycle 52 is false, the check at 54 is however true and we switch to line type 512 cycles. The check at 56 is also true, but the screen is already started. We now wait until we get to the right border position where we switch frequency to 60Hz before the check at cycle 372 - and thus H is deactivated.

The only difference between these two cases is whether line type will be 508 cycles (60Hz) or 512 cycles (50Hz) - but the state machine accurately captures both cases.

Try walking through the state machine the same way regarding +2 and 0-byte cases. If I haven't made a complete fool out of myself they should trigger correctly (and as Nicolas mentions the 60Hz line will both start and end 4 cycles earlier).
Or maybe with a state machine bytes would be counted +2 every 4 cycles while DE is on. Nice, but a challenge for emulation and for your regular 160 bytes line, lots of useless computations.
The states (GLUE) only have to be checked at the specific cycles in the table - not in between. While the full version I'll post contains checks for BLANK and HSYNC that's above and beyond current emulation - removing them will lessen the amount of computation. A quick look** at my notes indicates that ~nine checks per line will capture all known line lengths for both STF and STE. Separating the code bases will bring that down to ~seven. edit: And since those checks follow the hardware they will work with future demos as well doing switches at different positions compared to current ones :angel:

/Troed

*) I'll get back to this, DE becomes active slightly later than H is set to true. There are internal GLUE delays affecting this as well. It's possible to deduce the exact cycle lengths and thus bytes used for the borders in different modes from the full description. You're correct in that a left border in 60Hz is 24 bytes (since a 60Hz regular line starts 4 cycles earlier) - it's the distance between DE active for left border (hires screen start) and DE active for 60Hz line.

**) My first quick look was in error, it captured additional checks that I think no one uses. However, some of the additional checks _are_ probably used by demos even though they likely disrupt sync. In any case, I think it's time for emulation to start looking into running a state machine, at least as an option. It's not that heavy duty.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

Yes I see that my big revelation will not surprise everybody...
Still here it is:

In highres, DE starts at linecycle 6 (+-WU), not 0 like it is generally
assumed.
This is why a shift mode switch in the first cycles of the scanline will
work.
Bonus bytes are 26 because DE is on from 6 to 58 (start of 50hz line), that
is 52 cycles. 52/2 = 26.
In a 60hz line, there are only 24 bonus bytes (6 to 54 = 48, 48/2=24).

This justifies granting 24 bytes only for a left off at 60hz.

Edit: make -2 for the usual WU1 values: 4-56 for 50hz, 4-52 for 60hz
Last edited by Steven Seagal on Fri Sep 27, 2013 5:30 pm, edited 1 time in total.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote: The reason why the state machine checks for STE is different (16 cycles earlier) from STF is completely logical. The STE screen always start 16 pixels earlier - due to its hardware scroll capability - and a separate register then controls which pixel starts where. That's the reason for DHS's (right?) trick with being able to always switch those 16 pixels on and gain screen width.
There's much in your detailed post (thx), I'll try to answer this point now. I never paid much attention to this, but it would explain the STE difference for lines +2. For sure Steem will benefit from those insights.

The first raster needs to be scrolled by HSCROLL pixels, this means extra shifts before display. But the raster must be fetched (16 cycles) before it's scrolled, so it's like DE was on for prefetch, nothing is displayed nor fetched while the scrolling hasn't been applied, which takes 16 cycles (whatever HSCROLL, also 0), plus the usual latency, before pixels of first raster are rendered.

40: prefetch: 16 cycles
58: STE scroll: 16 cycles // don't display, don't fetch ??
86: render 1st pixel/ start fetching raster 2

Right? I don't claim it is, I try to understand.

Edit: and what about med res, high res? Can you scroll 1-15 in those resolutions? Does the line 50/60 decision occurs at the same cycle in STE?
Last edited by Steven Seagal on Sat Sep 28, 2013 8:39 am, edited 1 time in total.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

Hey I said revelations, so here's more about the left border. When we say DE starts at 6 (+-WU), the MMU still needs to prefetch 8 bytes (16 cycles), and there's the mysterious latency (12 cycles). Those values are the same in all resolutions. This would mean that in high res, first pixel would appear at linecycle 6+16+12=34.
On a 224 cycles scanline, this would make a pretty big border! And our beloved monochrome screens were known for a (ridiculously, say amiga fanboys) large border.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Alright, here's my current state-of-progress when it comes to the GLUE state machine. I do not claim this to be infallible, I'd welcome input for improvement. It's based on Alien's pseudo code, Ijor's posts, Paolo's Excel-sheet and Dio's timing diagrams. And my thinking.

Note: This is meant to be an as complete description as currently known for the GLUE state machine. It the combines with MMU (see below) and Shifter (not done yet) to produce the known video effects.

Code: Select all

VAR H; H signal as per Alien's doc, combines with V and becomes DE detected by MMU
VAR LINE = (LORES) NTSC by default; PAL = 512 cycle 50Hz, NTSC = 508 cycle 60Hz, HIRES = 224 cycle 71Hz
- inited at HSYNC
VAR RES; $ff8260, 0 = LO, 2 = HI
VAR FREQ; $ff820a, 0 = 60, 2 = 50
VAR BLANK; Activate blank if true. When BLANK is set no RES/FREQ checks are made (or simply no DE changes?)
VAR HSYNC; Activate hsync if true, will also deactivate H (that's why Enchanted Lands line continues on)

Default RES values are for WS3/4. In WS1 all RES state checks happen 2 cycles earlier, in WS2 2 cycles later.
Default FREQ values are for WS1/3, in WS2/4 all FREQ state checks happen 2 cycles later

6		IF(RES == HI) H = TRUE
[checks relating to "just blank" line skipped, want to understand how it functions before adding it]
32		IF(RES == LO) BLANK = FALSE
STE 36	IF(FREQ == 60) H = TRUE
STE 40	IF(FREQ == 50) H = TRUE
STF 52	IF(FREQ == 60) H = TRUE
54		IF(FREQ == 50) LINE = PAL
STF 56	IF(FREQ == 50) &&
58			(RES == LO) H = TRUE
166		IF(RES == HI) H = FALSE
186		IF(RES == HI) BLANK = TRUE
372		IF(FREQ == 60) H = FALSE
376		IF(FREQ == 50) &&
378			(RES == LO) H = FALSE
452		IF(RES == LO) BLANK = TRUE
464		IF(RES == LO) HSYNC = TRUE && H = FALSE
504		IF(RES == LO) HSYNC = FALSE
One (of many) things I haven't figured out is why the check for BLANK at 452 isn't affected (or is it?) by stabilizers (444-456 hi/low, 440-456 mid/lo). Alien says it is (hi/lo) causing 16 additional pixels to show. That takes us into HSYNC timing though.

MMU state machine related to DE as received from GLUE:

Code: Select all

VAR H; Same signal as raised by GLUE above (it's actually combined with V forming DE)
VAR DE; DE as detected by the MMU 

Note: These cycle timings are NOT affected by wake states

8         IF(H == TRUE) DE = TRUE
STE 40 IF(H == TRUE) DE = TRUE*
STE 44 IF(H == TRUE) DE = TRUE*
STF 56 IF(H == TRUE) DE = TRUE
60       IF(H == TRUE) DE = TRUE
168     IF(H == FALSE) DE = FALSE
376     IF(H == FALSE) DE = FALSE
380     IF(H == FALSE) DE = FALSE
468     IF(H == TRUE) DE = FALSE

*) STE due to hardscroll activates 16 pixels earlier, but only displays if additional registers are set
A key difference between this and earlier posts is the addition of an MMU "state machine" that picks up H from GLUE and checks to see whether DE should be activated, instead of just activating DE directly. This is due in part to logical thinking, Dio's comment that HSYNC-to-DE is the same in all wakestates as well as measuring the diagrams. DE is not activated at the cycles where it's checked if it should be, but later - and NOT wakestate dependent. The checks on the GLUE RES and FREQ registers are so the delay between the state-check and DE signal is variable. The earliest possible DE activation is the same cycle as the latest possible RES check (WS2) - it cannot be before. In real hardware this is likely to be caused by the MMU not "picking up" on GLUE DE activation until some cycles later due to not being clock synchronized, also aligning to four cycle boundaries.

Do note that there's a lag between FREQ checks and RES checks that looks like this, which explains the "IF X AND Y" checks at cycle 56/58 as well as at 376/378 in the GLUE state machine:

Code: Select all

WS1 (DL6): 0 cycles lag
- H is checked at FREQ 56, RES 56
WS4 (DL4): 0 cycles lag
- H is checked at FREQ 58, RES 58
WS3 (DL5): 2 cycles lag
- H is checked at FREQ 56, RES 58
WS2 (DL3): 2 cycles lag
- H is checked at FREQ 58, RES 60
When DE is activated there's then additionally a lag (3 to 6 cycles depending on wakestate, DL3-DL6) followed by LOAD (16 cycles) and 2 cycles more until first pixel is visible on screen. That means that from a regular DE activation at cycle 60 we have our first "pixel cycle" at 81 (wakestate 2), 82 (wakestate 4), 83 (wakestate 3) or 84 (wakestate 1).

How to use the state machines to figure out line lengths:

0 byte line: DE never activated
54 byte line: DE activated at 60, deactivated at 168. (168-60)/2 = 54
56 byte line: DE activated at 56, deactivated at 168. (168-56)/2 = 56
80 byte line: DE activated at 8, deactivated at 168. (168-8)/2 = 80
158 byte line: DE activated at 60, deactivated at 376. (376-60)/2 = 158
160 byte line: DE activated at 60, deactivated at 380. (380-60)/2 = 160
162 byte line: DE activated at 56, deactivated at 380. (380-56)/2 = 162
184 byte line: DE activated at 8, deactivated at 376. (376-8)/2 = 184
186 byte line: DE activated at 8, deactivated at 380. (380-8)/2 = 186
204 byte line: DE activated at 60, deactivated at 468. (468-60)/2 = 204
206 byte line: DE activated at 56, deactivated at 468. (468-56)/2 = 206
230 byte line: DE activated at 8, deactivated at 468. (468-8)/2 = 230

How to use the state machines to figure out border sizes:

DE activated at 8 compared to regular NTSC line at 56. (56-8)/2 = 24 bytes
DE activated at 8 compared to regular PAL line at 60. (60-8)/2 = 26 bytes

DE deactivated at 468 compared to regular PAL line at 380. (468-380)/2 = 44 bytes
DE deactivated at 464 compared to regular NTSC line at 376. (464-376)/2 = 44 bytes
- Note: An NTSC line is 508 cycles instead of 512 so the deactivation due to HSYNC will happen at 464

Future

There are some things not included yet:
  • The "just blank" lines possible to create with switches of both RES and FREQ
  • the 14 byte line (RES = HI at cycle 32 will cause HSYNC which will cancel DE 4 cycles later. (36-8)/2 = 14)
  • investigating the STE +2 line (is it possible to trigger at cycle 34 and earlier? If not, why not?).
  • STE 20 byte left border (caused by 504/4 switch - is it related to STE pre-fetching for hires?)
However, I'm tempted to claim this research - which again is not at all my original but based on all the input from many people - should go up on a wiki page. Also, I'd love to see an implementation in an emulator just to see how far this will take us and how many "special cases" are still needed (and then how to include them in the state machine description). Oh, and trace diagrams of NTSC and Hi res lines would be just awesome .. :)

/Troed

(This post might see edits during the weekend, I will try to avoid making too many short posts)

Also: Write up the Shifter state machine for shifting words. Alien's description is pretty much complete.
Last edited by troed on Sun Sep 29, 2013 10:44 pm, edited 6 times in total.
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

I think you're being too complicated over DE. I'd be really surprised if it's not (DE = H and V).

It's a big plus to think like a hardware engineer. What's cheapest? What's easily made out of TTL circuits? (The custom chips were all prototyped in wirewrap).

NMOS allows really quite cheap comparisons using mask ROMs to decode - essentially it's just a matrix of wires and a diode to indicate 1s and 0s. My guess is that internally the horizontal state machine is a ROM which takes a 9-bit input (7 bits of counter, 50/60 syncmode and lowmed/high resolution) and outputs bits (probably 2 bits, to each side of an S/R flipflops) for H, HSYNC and HBLANK. The really open question then is what controls counter reset. The 'simple' solution would be another output bit, but that doesn't fit with the line length being determined at the 'opposite' end of the line; there has to be some additional logic around that.
Last edited by Dio on Fri Sep 27, 2013 10:17 pm, edited 1 time in total.
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Dio wrote:I think you're being much too complicated over H and DE. I'd be really surprised if it's not (DE = H and V).
I think you're closer to the solution there with "GLUE cycles" spanning a 4 cycle uncertainty if there's a better explanation to be had, but neither the trace diagrams nor your comment on HSYNC-to-DE for the different wakestates add up unless DE is only activated/deactivated on "boundaries" (8,56,60,168,376,380,468) that are as late as the latest possible RES check (in WS2). This is easily seen near cycle 56/58 where there's a FREQ check and a RES check (the same happens at 376/378) that can both be used to modify DE - but it in turn isn't offset by two cycles.

The state machine I posted is a model. The explanation can surely be that GLUE raises DE but it in turn isn't "picked up" until 0-4 cycles later - resulting in the same external behaviour.
Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: horizontal scrolling on ST

Post by Dio »

I think that's exactly what happens. It looks to me like the MMU queries the status of DE at the start of each CPU phase to determine what the next video phase will be, and we know there's no fixed alignment between the two.

That would imply a latency of 2-5 cycles, but we see 3-6; so either it queries DE slightly earlier (opposite edge of the clock, or aligned with the 16MHz clock) or it queries on the downward clock edge and DE doesn't actually arrive for a few extra ns (which is perfectly possible; if MMU handles all its timing off the 16MHz clock, Glue at least one flipflop propagation delay behind MMU due to the propagation delaty in the clock divider).
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Dio wrote:I think that's exactly what happens. It looks to me like the MMU queries the status of DE at the start of each CPU phase to determine what the next video phase will be, and we know there's no fixed alignment between the two.
Excellent, I'll edit the post to clarify that in its context DE means the externally picked up signal and H is the internally generated one.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

@Troed:

Working on such a pseudo-program is of course very useful. It's still incomplete, for example:

- We also need support for 4bit scrolling, by ST-CNX or the new one by Paolo (bees), with unstability problems!
- there's also RES=1 + Med res overscan: sometimes there's a bitplane shift
- There's a difference between STE and STF for some 0-byte lines (Nostalgia/Lemmings: 28/44 for STF 32/48 for STE)
- STE +20 line?

It could be that we need two apart programs for STF & STE, that would improve readability.
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote:It's still incomplete, for example:

- We also need support for 4bit scrolling, by ST-CNX or the new one by Paolo (bees), with unstability problems!
- there's also RES=1 + Med res overscan: sometimes there's a bitplane shift
Both those would be solved (I believe) with the Shifter state machine - the one I've described is purely GLUE (although maybe the GLUE-DE -> MMU detection should be broken out).
- There's a difference between STE and STF for some 0-byte lines (Lemmings: 28/44 for STF 32/48 for STE)
Both those would cause BLANK=FALSE at cycle 32 to fail and thus create a 0-byte + BLANK result. The used timing might be different for STF and STE in the demo but I don't think it matters ;)
- STE +20 line?
Is this the 178 and 180 byte lines Leonard mentioned in the 68000 cycles thread? If so I never managed to figure out from that description whether they really existed. I'm very much open to the possibility that the difference in DE-activation between STE and STF (due to pre-fetching for hardware scroll) could make it possible to create additional line lengths. Where are they used?
It could be that we need two apart programs for STF & STE, that would improve readability.
Agreed. I might've gone too far into trying to make it readable (again, breaking out when the MMU detects DE would help with that as well).

I will make additional edits during the weekend, there's one already planned. The checks at 56/58 and 376/378 are actually single checks (if X and Y) for both FREQ and RES but there's a delay that I want to express clearly as well.

Thanks for commenting.

/Troed
Last edited by troed on Sat Sep 28, 2013 9:33 am, edited 1 time in total.
User avatar
Steven Seagal
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2018
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: horizontal scrolling on ST

Post by Steven Seagal »

troed wrote:
- STE +20 line?
Is this the 178 and 180 byte lines Leonard mentioned in the 68000 cycles thread? I
I don't know about this, but this is an overscan without stabiliser used in More or less zero, it's commented in both Steem and Hatari (LINE_PLUS_20).
In the CIA we learned that ST ruled
Steem SSE: http://sourceforge.net/projects/steemsse
User avatar
troed
Atari God
Atari God
Posts: 1460
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: horizontal scrolling on ST

Post by troed »

Steven Seagal wrote: I don't know about this, but this is an overscan without stabiliser used in More or less zero, it's commented in both Steem and Hatari (LINE_PLUS_20).
Add support for STE 224 bytes overscan without stabiliser by switching */
/* hi/lo at cycle 504/4 to remove left border (fix More Or Less Zero and */
/* Cernit Trandafir by DHS, as well as Save The Earth by Defence Force)
(Hatari video.c)

Right, silly me, I knew there was an STE overscan with a different total line length. I haven't given it much thought at all to be honest - but it seems to be close (edit: exactly*) to what Leonard described (earlier left border switch).

I'll read up on it and see whether we have the knowledge needed for a state machine description of what causes it. Thanks.

(Currently it has me confused since it's precisely at HSYNC end)

*) I'd say Leonard found these in 2006 :P http://www.atari-forum.com/viewtopic.ph ... 7&start=24 It seems to be a 20 byte left border which would explain the 180 line, and combined with a short right it would make 178.

Simple cycle counting provides a hypothesis as to how it can happen but ideally we'd need trace diagrams ... ;) Delay HSYNC end for 12 cycles delays left border DE activation with 12 cycles = 6 bytes. Also, according to DHS the whole screen is shifted 8 pixels to the right - which indicates Shifter involvement here. Uhm, but Nicolas writes in video.c that the screen is shifted 8 pixels to the left.

As to why they're STE only I have no idea. Fluke difference in how DE is raised (or detected).

(Yes this post just keeps on getting edited)

Can the STE hardware scroll in high res as well? If so, it needs 16 cycles (?) earlier pre-fetch compared to ST for high res as well - which should mean that the left border switch is earlier (which is true - "late" ST left border code, like ours in SYNC - doesn't work on STE) and DE is earlier. 8 - 16 = 504 so it sort of matches up, although it doesn't explain why a left border switch at 0 works. I wonder if this means that the STE HSYNC pulse is shorter. Also, maybe the prefetch is only 4 cycles for high res - I'm just speculating here and haven't studied it enough. STE is "after my time", you know ;)

http://atari-ste.anvil-soft.com/html/devdocu2.htm
Post Reply

Return to “Coding”