horizontal scrolling on ST

Dio · Post by **Dio** » Fri Oct 04, 2013 9:43 pm

troed wrote:Thanks for the explanation. Does this mean the MMU consults a memory map to know whether it needs to do this for a specific memory address or not?

My expectation is that MMU owns DTACK for everything going through the gateway and the timing of it is run by the same state machine that manages RDAT, WDAT, LATCH, LOAD etc.

So it's Glue's decoding logic that indicates (I think on the /RAM and/or /DEV signals) to MMU that it's MMU's responsibility to handle the operation and MMU takes it from there.

troed · Post by **troed** » Fri Oct 04, 2013 10:10 pm

ljbk wrote:If you look at the source available at DHS, you will see that there is only one STE patch related to line sizes. It is related to the 0 bytes case: 60Hz has to be set at emulator cycle 40 and we go back to 50Hz at emulator cycle 52.
So i assume that the +2 case works in the same way for STE as for STF for the detected (old)wakeup state on STE by that program (probably 1).

Indeed - if we look at the NTSC and PAL starting positions in the state machines:

STF:
52 IF(FREQ == 60) && (RES == LO) H = TRUE
54 IF(FREQ == 50) LINE = PAL
56 IF(FREQ == 50) && (RES == LO) H = TRUE

STE:
36 IF(FREQ == 60) && (RES == LO) H = TRUE
40 IF(FREQ == 50) && (RES == LO) H = TRUE
56 IF(FREQ == 50) LINE = PAL
58 Also related to line length similar to above for 50/60Hz. Unknown cause.

We see that it's very similar - but due to the STEs hardware scroll capability it needs to pre-load 16 pixels so the signal to MMU is raised 16 cycles earlier and the complexities in two cycle timing with the LINE-length position in between is gone. What I wanted to figure out tonight was the STE equivalent to STF cycle 54, that is, the position where the length in cycles (508 or 512, and hi res timings as well) was taken.

I did not expect the answer to be two positions with slightly different behaviour

(Btw, contrary to what I have believed and posted before STE +2 can be created not only with 36/xx but also with 34/xx, 32/xx and 30/xx ... as the state machine indicates

The only thing that's important is to be at 60Hz at cycle 36 and back at 50Hz at cycle 56)

Dio wrote:So it's Glue's decoding logic that indicates (I think on the /RAM and/or /DEV signals) to MMU that it's MMU's responsibility to handle the operation and MMU takes it from there.

Thanks. This logic has always fascinated me but I never took the time to learn it. Appreciated.

Steven Seagal · Post by **Steven Seagal** » Sun Oct 06, 2013 11:49 am

troed wrote: (Btw, contrary to what I have believed and posted before STE +2 can be created not only with 36/xx but also with 34/xx, 32/xx and 30/xx ... as the state machine indicates The only thing that's important is to be at 60Hz at cycle 36 and back at 50Hz at cycle 56)

See, you got confused by those STE lines +2 too.
Before latest fix, Steem would also only take a change at cycle 36. It seems to work because the rare cases (Forest STE and ?) use this timing.

Steven Seagal · Post by **Steven Seagal** » Sun Oct 06, 2013 12:01 pm

troed wrote: (And if it preloads for 16 cycles in high res as well then it actually preloads for 8 cycles before even knowing if there's a screen to start displaying. THAT would look strange in a trace .. )

edit: Maybe I should point out that I assume horisontal scroll is always 16 pixels in all resolution and thus pre-fetch should need one word (4 cycles) in high res, two words (8) in medium and four (16) in low. That works out well with the timings as well, so, no strange traces. It was hypothetical

/Troed

I'm sure it will come down to this because it's the best explanation for the timing difference in STE's left border off trick. Like you say 16 cycles off would make strange traces.
It's not prefetch but the scrolling itself after prefetch that takes #cycles needed for one raster.
Prefetch is always 16 cycles, but then there's scrolling - of one raster, then the still strange latency then rendering.

troed · Post by **troed** » Sun Oct 06, 2013 10:35 pm

Steven Seagal wrote:See, you got confused by those STE lines +2 too.
Before latest fix, Steem would also only take a change at cycle 36. It seems to work because the rare cases (Forest STE and ?) use this timing.

I guess you could say that

It wasn't until I started documenting the state machines that I realized that there seemed to be no reason for STE to do anything between the "just blank" (no pixels displayed) lines that end at FREQ 28* and cycle 36 for +2 as documented in the 2006 thread. Then it struck me that the tests MiggyMog did for Paulo began at cycle 36. So, cycle 32 (or 34,30) were never tested at all, see http://www.atari-forum.com/viewtopic.ph ... 7&start=81

(So I did, a few days ago. I will continue testing the last few remains of STE behaviour that might differ from the extensive tests Paulo gave us with the excel sheet)

Steven Seagal wrote:I'm sure it will come down to this because it's the best explanation for the timing difference in STE's left border off trick. Like you say 16 cycles off would make strange traces.
It's not prefetch but the scrolling itself after prefetch that takes #cycles needed for one raster.
Prefetch is always 16 cycles, but then there's scrolling - of one raster, then the still strange latency then rendering.

Here's my current thinking on pre-fetch and why it creates +20 (left border) as well as +4/+6 (regular lines). I don't think it's because of GLUE and I don't think it's the Shifter either at the moment. MMU is the one that decides the values of the current screen address and seems to be the only one where the timing and signals fit.

As always - comments are very welcome. All hypotheses need to be tested and if they survive they become gospel

On STE the check to see if we should start a hires (= left border) screen is done 4 cycles earlier than on ST. This matches the 16 cycle difference for lores.

0 IF(RES == HI) H = TRUE

The reason for this is pre-fetch, needed to be able to hardware scroll. It's always done, but not used unless other STE-specific registers are set. Thus the MMU receives this information and starts LOADing up the Shifter. One word per 4 cycles. For hires, one word is all that's needed for 16 pixels - in lores four words are needed (16 cycles, 8 bytes).

40 IF(FREQ == 50) && (RES == LO) H = TRUE

Let's start at lores. At cycle 40 the pre-fetch starts. 4 cycles later a word has been LOADed into the Shifter - but the actual screen memory adress is not touched at this time. The MMU simply does not know if this will be used. If a switch to hires is made at cycle 44 however, the MMU finds itself in hires pre-fetch which is only 4 cycles long so it's already done. It thus updates the screen memory adress with 6 bytes read - 8 minus the word already done.

The same happens at cycle 48 - two words have been LOADed into the Shifter which is well beyond a hires pre-fetch and when the resolution switch is made the screen memory adress is updated with 4 bytes read - 8 minus the two words already done.

(I tried a short switch, avoiding 56, at cycle 52 but did not get +2 as I had hoped for)

At cycle 4, if we go back to lores, the MMU finds at that there are still 12 cycles left to pre-fetch for lores. It's thus not until cycle 16 the real signal for the Shifter to start displaying data is sent, as well as MMU updating the video counter registers: (56-16)/2 = 20.

56 IF(FREQ == 50) LINE = PAL

Sounds like a reasonable hypothesis? It comes with a caveat: It requires the MMU to know about resolution, something it obviously can do being merged with GLUE in STE but it's not known behaviour and it differs from ST. Also, I use CPU cycle timing even though there's a delay between it and MMU reacting. Not sure how that deals with the overshoot-20 byte thing.

I'll update the wiki with this hypothesis if it survives a few day without obvious holes.

/Troed

*) I'm getting nowhere just pondering what it is that causes these so when I have the STE values for FREQ and RES I'll add those and Paulo's ST values to the state machines

Steven Seagal · Post by **Steven Seagal** » Mon Oct 07, 2013 7:15 pm

OK I've looked at part of the "state machine", and it makes sense.
In a later version of Steem, not v3.5.3 because it's risky and demands testing, some parts where we're still looking for specific switches should be replaced with checking the state.

For example for one type of 0-byte line, we have something like this: