Here is the latest status
Let's first review my understanding of the mechanisms involved in the transfer of bytes from the FDC to the memory through the DMA controller. Suppose that you want to read a track, you first have to prepare the DMA for transfer by providing it with a buffer address and a count. The address must point to a buffer big enough to contain all the data from the FDC (about 6500 bytes in case of a read track), and the count should indicate the maximum number of chunk of 512 bytes the DMA will have to transfer (for a read track we can use 20). Internally the DMA has to 16 bytes buffer that are used alternatively. This allows filling a buffer while waiting for the processor to read the other one.
The attached figure shows the sequence of events when reading a track through the DMA. Note that the scale is widely incorrect, but this is acceptable as the purpose of the diagram is just to show the sequence of events.
After the FDC receive the read track command, it waits for the index pulse. Shortly after the drive has reached the index pulse the read can start and therefore the FDC raise a DRQ to indicate that one byte has been assembled and is ready to be fetched. When the DMA receives this signal it will start a fetch cycle to grab the byte from the FDC on the private bus that joins the DMA and the FDC. The byte is then stored in one FIFO of the DMA and the floppy lower the DRQ until the next byte is assembled. When this happen a new transfer takes place and this goes on until 16 bytes has been store in the DMA FIFO. At this point the DMA issues a Bus request to the 68000 to indicate that it wants to perform a DMA transfer. When the Bus request is granted by the 68000 the DMA take over the control of the system bus and transfer the 16 bytes from the FIFO into the memory pointed by its address register. At the end of the transfer the DMA release the bus to the processor and increment its internal address register by 16.
Here is roughly the timing values involved in the transfer: The time between the reception of the command and the IP can be anything up to 200 ms (one revolution). A byte is assembled by the FDC every 32µs (8 bits of 4µs). And therefore 16 bytes are stored in the DMA for burst transfer roughly every 512 µs.
Now let's get back to writing a routine to measure the time it takes for a bytes to be assembled by the FDC. Measuring this time can be very useful for example to find out if a bit width variation has been used for protection purpose.
For that matter we are going to use one of the 68900 MFP timers. Usually timer A is a good choice. The MFP is connected to a 2.4576 MHz crystal and offer several pre-scaling. It will become obvious later on that the best choice is to use a pre-scaling of 10. This gives a frequency of 245.76 KHz and a period of 4.0690104167 µs (4069 ns is a good integer approximation). As the timer register is 8 bits wide an overflow happen every 256 ticks or about every 1042 µs.
When the read track command execute we need to set up a loop that will wait for the INTRQ to be raised by the FDC indicating the end of command. This signal is polled in the loop from the bit 5 of the MFP. Based on the above information we see that in order to get the transfer time for the characters we can look at when the address is changed in the DMA address register. This change happens every 16 received characters or at nominal rate every 512 µs.
The pseudo code looks like this:
Code: Select all
Prepare the FDC (select, seek, etc)
Prepare the DMA (read mode, buffer address, count)
Prepare the timer (reset, set pre-scale to 10, start)
Has dma address changed ?
Yes Get the timer time and store
Store the new address
Has the FDC raised the INTRQ ?
The loop is composed of reading the DMA address and the MFP GPIO register, and two tests to check if address has changed or command has terminated. The loop must take less than 512µs in order not to miss an address change and this is of course not a problem. But the precision of the measurement is directly related to the time of the loop (time when both tests fail). If the loops takes for example 102 µs the precision will be of 102/512 or about 20% for 16 bits or 1.2% / byte which is not acceptable. It is therefore important to optimize this loop. For example I started with a loop time of about 110µs and optimized to about 35µs (still not acceptable) and went down to about 15 µs (bare strict minimum). Now this is acceptable as it represent a worst case precision of 12/512 (about 3%) on a chunk of 16 bytes or about 0.2% per byte. Although you try to shorten the loop a maximum do not forget to handle the timer overflow (but if you are smart it should not affect the loop time).
During the discussion about the timer I mentioned that the pre-scale of 10 (or about 4µs) was a good choice. Indeed if you look at the values in the loop 512 is about 128 * 4µs tick and this happen to be just the median value ($80) of a byte which is very convenient to store in array of byte the time of each 16 bytes chunks transferred. Note also that 4µs provides a precision which is in line with the loop time.
If you do not take anymore care you will notice that on regular basis the values measured/stored are completely off. For example you are getting one value largely bigger than normal and the next one shorter to compensate? It does not take to much time to infer that the problem comes from the fact that the processor is diverted to do other things which in turn means that it processes interrupt. You can ignore the problem by post processing the values and correcting the two wrong values by replacing both values by the mean value. However a better solution is to enter a critical section at the beginning of the loop by turning off all the interrupt (note that as far as I know this requires using assembly code!).
Now everything works fine, except one thing: you do not get the timing for the first chunk of data transferred as the first address increment happen after 16 bytes transfer. This may be acceptable, but I have tried to come with a way to measure this first chunk. The idea that I explored was to find a way to identify when the first transfer occurs. As already mentioned this can't be inferred by reading the DMA address as it is changed only every 16 bytes. But the DMA has a status register that reflect the state of the FDC DRQ signal in bit 3. In the Atari HW documentation it is explicitly said that it is a bad idea (i.e. don't do it) to query the status register during DMA transfer. This makes sense as the DMA has two sides: one toward the FDC to transfer bytes, and one toward the 68000 to read and writes DMA registers (not to mention DMA transfer). Consequently it is probably too much load for the DMA to transfer a byte to/from the FDC while the 68000 try to read/write some internal register.
However it would be nice to get the time of the first DRQ by reading the DMA status. As a matter of fact as the transfer has not yet started we are not creating too much perturbation to the DMA and it works great.
This mean that just before the loop already described we need to add another tight loop that just check for the first DRQ (just after the IP) at the end of the loop we store the current time which correspond to the time of transfer of the first byte.
The problem is that is does not return the expected timing. If you look a the transfer diagram you can see that between the first DRQ and the first address change we have the transfer of the 16 bytes (this s good), but we also have some time to get the bus granted and released by the 68000, plus the time to burst transfer the bytes from the FIFO to memory. It is only after that the DMA address register is changed. We therefore have some extra time that accumulates for this first chunk transfer. I am now trying to see if this is a constant introducing a predictable extra time (in which case it would be easy to compensate) or not.
Also attached a pdf version of this doc.
You do not have the required permissions to view the files attached to this post.