MMU control of CPU/BLiTTER versus Shifter access cycles

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

MMU control of CPU/BLiTTER versus Shifter access cycles

Postby mc6809e » Sun Jan 29, 2012 11:07 pm

I recently came across this document on cycle counting:

http://pasti.fxatari.com/68kdocs/AtariSTCycleCounting.html

It contains this line: "A [cpu] bus cycle accessing the internal MMU data bus will always perform aligned, in relation to the previous access on the same bus, at a four cycles [cpu cycles] boundary."

So basically this means the CPU gets the even memory bus cycles and Shifter gets the odd cycles. That works since the CPU takes 4 cycles to access memory and a memory bus cycle is 2 CPU cycles. A nice interleaving of accesses occurs. The CPU usually doesn't have to wait.

A couple of questions: what about the BLiTTER? Does it also get every other cycle? It shares the CPU's bus. That would explain the 4MBps number for a clear operation. If every bus cycle takes 2 CPU cycles, and the BLiTTER gets every even cycle, same as the CPU, then there are 2 million writes of 2 bytes per second during a clear. And with the HOG bit off, the CPU and BLiTTER take turns using 64 of the even memory cycles available on that shared bus. So the switch happens every 64 * 4 = 256 CPU cycles.

But if Shifter isn't accessing memory, during VBlank or Hblack for example, is there some way to get access to those free cycles? The quote above seems to suggest that it isn't possible.

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby mc6809e » Mon Jan 30, 2012 3:52 am

One interesting thing about the 256 CPU cycle number: it's the smallest power of 2 of cycles that covers the cycle count for all CPU instructions. The DIVS instruction, for example, takes as many as 156 cycles to execute.

Now division is a common operation during 3d calculations. Ideally, then, when combining blitter operations and certain calculations, you obviously want to arrange things so that each of your DIVSs runs just before the blitter reclaims the bus from the CPU so that the CPU can run the DIV internally while the blitter does its work. Same thing goes for MULs, though they benefit less from concurrency.

User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1858
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby Cyprian » Mon Jan 30, 2012 1:41 pm

ST memory bandwidth is 8MB/s - 4 milions cycles /sec
2 milions/sec is granted to Shifter/SoundDMA, other 2milions/sec to CPU/Blitter/FDD HDD DMA.
There is no possibility to get Shifter's cycles by CPU/Blitter because MMU blocks access to them (even when those cycles are not in use).
Every CPU/Blitter memory access to ST RAM is always rounded to 4 cycles.
In case of 6cycle 68000 instruction, that instruction is rounded to 8, or paired with other instruction (more info on pasti site).
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby Dio » Mon Jan 30, 2012 5:12 pm

In the details it gets pretty complex. The rough rules are:

- There are two main data bus domains in the ST, the DRAM bus and the CPU bus.
- The two buses work together for CPU, DMA and Blitter DRAM accesses and independently at all other times. This is managed by a bus gateway controlled by the MMU.
- In each 0.5us period half is allocated to the MMU / Shifter ('MMU phase') and the other half to the CPU / DMA / Blitter ('CPU phase').
- The MMU phase contains either a DRAM refresh or a Shifter video access. There is no way to disable the refresh and force it to hand the phase to the CPU instead.
- When the CPU or another CPU-bus device starts a memory access, if the access is to DRAM and would not be aligned with the CPU phase, two wait states are inserted (by delay of DTACK) to align the CPU access with the CPU phase.
- The Blitter duplicates the CPU asynchronous bus protocol, so behaves (nominally) like the 68000.

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby mc6809e » Mon Jan 30, 2012 7:20 pm

Dio wrote:In the details it gets pretty complex. The rough rules are:

- The MMU phase contains either a DRAM refresh or a Shifter video access. There is no way to disable the refresh and force it to hand the phase to the CPU instead.


That's too bad. Even with refresh enabled there should be something like 50 odd bus cycles available after HBlank. If the blitter had the even cycles and the CPU the odd cycles remaining during HBlank then the HOG bit could be left on and the CPU would still get something. And then there are the 60+ lines where Shifter is off completely...

I see it's not possible to take advantage of these, though. Thanks for the information.

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 872
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby EvilFranky » Mon Jan 30, 2012 8:17 pm

How does the Amiga blitter work in comparison to this?

Sent from my GT-I9100 using Tapatalk

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby mc6809e » Tue Jan 31, 2012 4:58 am

EvilFranky wrote:How does the Amiga blitter work in comparison to this?

Sent from my GT-I9100 using Tapatalk


My understanding is that the Amiga blitter/CPU can use any extra cycles (including odd cycles) but in some cases doesn't. A clear operation with the blitter, for example, does a write every other available memory cycle even when the display is turned off and extra cycles are available. A copy operation, on the other hand, will use all available cycles, so it gets a little extra speed while the system is displaying the boarder areas where no display fetch occurs. Of course the whole system runs at a 12% slower clock than the ST. I'm not sure how much the extra memory accesses make up for the difference.

User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 1858
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby Cyprian » Tue Jan 31, 2012 10:14 am

mc6809e wrote:My understanding is that the Amiga blitter/CPU can use any extra cycles (including odd cycles) but in some cases doesn't


CPU in Amiga500 have the same behavior as in ST - every instruction is rounded to 4 cycles. The only A500 blitter can use all free cycles (memory slots).
In A1200 CPU is rounded to 8 cycles (1.77milions cycles/memory slots per second)
Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby Dio » Tue Jan 31, 2012 10:53 am

mc6809e wrote:That's too bad. Even with refresh enabled there should be something like 50 odd bus cycles available after HBlank. If the blitter had the even cycles and the CPU the odd cycles remaining during HBlank then the HOG bit could be left on and the CPU would still get something. And then there are the 60+ lines where Shifter is off completely...

This wouldn't be possible without a more expensive design.

Although the actual DRAM bus is only required for 2 8MHz CPU cycles to execute a memory access, the 68000 protocol means the CPU bus is required for longer than that - each access requires at least three. The MMU handles this using the bus gateway - it has a data latch for CPU reads, effectively converting the fast-page memory into something more like EDO RAM, and it latches the write into the DRAM very early in the DRAM clock cycle, taking advantage of the data appearing inside the first couple of cycles, so even though the 68000 is still signalling the write on the third clock the MMU and DRAM have already moved on.

The ST is very aggressive with its DRAM timing. It clearly breaks at least timing parameter, the back-to-back cycle time for most of the memory types used is a bit over the actual slightly-under-250ns cycle time - for example, for the Fujitsu RAM common in early STs the cycle time minimum is quoted as 265ns. A few of the other parameters and signals are driven awfully close and exploited to the max as well. dadhacker quoted one of the ST designers as saying "You see, DRAMs are analogue devices really..."

So it wouldn't be possible to interleave the CPU and Blitter onto the CPU bus - the Blitter would have to have its own bus and bus gateway and a different arbitration protocol. Since the Blitter wasn't conceived when the ST was first created, but significantly predates the STE, this would have meant revising MMU and probably Glue, and Atari shied away from that - there were no custom chip changes from the -38A Glue until the STE came along.

During the visible screen, no refreshes are actually needed at all, but when V is inactive you need to average about five refreshes a line to stay in spec. The ST does the simplest thing and just refreshes on every MMU phase if it's not a video access. Probably costs a bit in power consumption, but doesn't change the peak, just the average, so doesn't add any cost.

The best that could have been offered at no cost would be to remove the need to delay the CPU or blitter to align with the CPU phase if the MMU was known to be inactive. It's not clear to me how hairy this would have been; it would obviously have meant a more complex logic for handling refresh, and it's possible that it would have added further complexity in the MMU timing because it would have broken some assumptions. In addition this would also have meant that ST code would not easily be executed at predictable speed, although a register bit could perhaps have been added to disable the behaviour.

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: MMU control of CPU/BLiTTER versus Shifter access cycles

Postby mc6809e » Sun Feb 26, 2012 5:19 am

Dio wrote:The ST is very aggressive with its DRAM timing. It clearly breaks at least timing parameter, the back-to-back cycle time for most of the memory types used is a bit over the actual slightly-under-250ns cycle time - for example, for the Fujitsu RAM common in early STs the cycle time minimum is quoted as 265ns. A few of the other parameters and signals are driven awfully close and exploited to the max as well. dadhacker quoted one of the ST designers as saying "You see, DRAMs are analogue devices really..."

[snip]

During the visible screen, no refreshes are actually needed at all, but when V is inactive you need to average about five refreshes a line to stay in spec. The ST does the simplest thing and just refreshes on every MMU phase if it's not a video access. Probably costs a bit in power consumption, but doesn't change the peak, just the average, so doesn't add any cost.


I really enjoy these technical details. Thanks.

I wonder if the constant refreshing might be compensating for problems that occur with the out-of-spec access times. That might explain the "analog" devices comment related to us from "dad's" conversation with an Atari engineer. By keeping all capacitors fully charged through constant refreshing, the device's sense amps don't need as much time to settle and can be latched a bit early. Sure, constant refreshes steal bandwidth, but maybe saving money on slower and cheaper DRAM was worth it. And you get the benefit of an 8MHz clock.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 5 guests