What a difference a bit makes - A debugging journey

Hardware, coding, music, graphic and various applications

Moderators: Mug UK, lp, moondog/.tSCc., [ProToS], Moderator Team

Post Reply
czietz
Hardware Guru
Hardware Guru
Posts: 2734
Joined: Tue May 24, 2016 6:47 pm

What a difference a bit makes - A debugging journey

Post by czietz »

Today, I investigated why the PC/MS-DOS emulator SoftPC would not run on my TT, whereas it runs fine under Hatari's TT emulation. After ruling out the usual stuff (i.e., making sure the setup is exactly the same), I already started to suspect an instability in my TT. But I would be proven wrong! This is a very condensed writeup of my investigation.

On the real TT, SoftPC would always crash with a bus error accessing address 0x10 (in user mode, where this address is in fact inaccessible). But why? Thanks to the detailed crash message provided by EmuTOS, I could place a breakpoint on the offending instruction in Hatari. Hatari dutifully disassembled for me what was going on:

Code: Select all

000E043A 3584 6900                move.w d4,(a2,d6.l)
> r
  D0 00000070   D1 000005A9   D2 00000619   D3 00000000
  D4 00000800   D5 000036C5   D6 000006FC   D7 00000619
  A0 001A0412   A1 00000000   A2 0019FD14   A3 0019FD14
  A4 0002DAFE   A5 001A3AD9   A6 001A0414   A7 002D049C
USP  002D049C ISP  00007FCC SFC  00000000 DFC  00000000
CACR 00003111 VBR  00000000 CAAR 00000000 MSP  00000000
SR=0300 T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IM=3 STP=0
I confirmed via the crash message on the real TT that A2 and D6 had similar values, and certainly this instruction should not access address 0x10. So what is going on?

Help came in form of the 68030 user manual:
addressing.png
As you can see, the second word of the offending instruction, 0x6900 is a "full format extension word", as its bit 8 is set. However, its "BD SIZE" field is "00", which is reserved according to the user manual. Heureka! This is an invalid instruction.

Disassembling the code in the vicinity shows a lot of instances of:

Code: Select all

000E0448 3584 6800                move.w d4,(a2,d6.l,$00)
... where the second word is 0x6800, a valid "brief format extension word".

Therefore, I conclude:
  • The version of SoftPC that I downloaded from the Internet has a bit-error, where one 0x6800 was modified to 0x6900.
  • Hatari lead me astray by disassembling the instruction and executing it as originally intended without crashing.
  • I suspect that the real 68030 has a minor bug regarding this invalid extension word. It probably wants to fetch the illegal instruction exception vector (which happens to be at address 0x10), but "forgets" to switch to supervisor mode; as it would usually do when handling an exception. This is why I see a bus error at address 0x10, instead of an illegal instruction exception.
Fixing the bit error, i.e., changing the 0x6900 into a 0x6800 makes SoftPC run on my TT. I can now turn my very fast TT into a very slow PC :lol:
IMG_6016.JPG
IMG_6015.JPG
You do not have the required permissions to view the files attached to this post.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3329
Joined: Sun Aug 03, 2014 5:54 pm

Re: What a difference a bit makes - A debugging journey

Post by ThorstenOtto »

Nice find ;) Such things are the ones that can make you debug for hours without getting to anything.

Btw. i think that should be fixed atleast in Hataris disassembler, then it would have been obvious.
For example, this is what my disassembler spits out:

Code: Select all

[00000000] 3584 6900                 move.w     d4,(a2,d6.l) ; 68020+ only; reserved BD=0
User avatar
LaceySnr
Captain Atari
Captain Atari
Posts: 321
Joined: Wed Jun 26, 2013 5:00 am
Contact:

Re: What a difference a bit makes - A debugging journey

Post by LaceySnr »

Glad you got it sorted, and thanks for the write up that was fun to read! Always sucks when it's one measly bit causing you an issue
czietz
Hardware Guru
Hardware Guru
Posts: 2734
Joined: Tue May 24, 2016 6:47 pm

Re: What a difference a bit makes - A debugging journey

Post by czietz »

Addendum: I had disabled TT-RAM, because initially I thought that it was the problem. But now with the fixed version and after re-enabling TT-RAM, SoftPC runs a little less slowly:
IMG_6018.JPG
IMG_6019.JPG
You do not have the required permissions to view the files attached to this post.
User avatar
TheNameOfTheGame
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2592
Joined: Mon Jul 23, 2012 8:57 pm
Location: Almost Heaven, West Virginia

Re: What a difference a bit makes - A debugging journey

Post by TheNameOfTheGame »

Nice sleuthing, glad you got that working!
mikro
Hardware Guru
Hardware Guru
Posts: 4566
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: What a difference a bit makes - A debugging journey

Post by mikro »

Thanks for the writeup! It's always so tempting to say "It's the emulator's fault!" and it's usually the wrong conclusion except the rare cases where it is actually true. ;-)
czietz
Hardware Guru
Hardware Guru
Posts: 2734
Joined: Tue May 24, 2016 6:47 pm

Re: What a difference a bit makes - A debugging journey

Post by czietz »

czietz wrote: Sat Mar 23, 2024 2:15 pm I suspect that the real 68030 has a minor bug regarding this invalid extension word. It probably wants to fetch the illegal instruction exception vector (which happens to be at address 0x10), but "forgets" to switch to supervisor mode; as it would usually do when handling an exception. This is why I see a bus error at address 0x10, instead of an illegal instruction exception.
The 68030 "bug" for the illegal extension word in the "3584 6900" instruction is even a bit weirder. The EmuTOS crash message also provides the "special status word" (SSW), with more information about the conditions under which the bus error occurred. The SSW is 0x070F in this case, which indicates FC2=FC1=FC0=1, i.e., a CPU space cycle, not even a memory access.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3329
Joined: Sun Aug 03, 2014 5:54 pm

Re: What a difference a bit makes - A debugging journey

Post by ThorstenOtto »

Did you also try to run the instruction in super-visor? Atleast that should not give an bus-error.
czietz
Hardware Guru
Hardware Guru
Posts: 2734
Joined: Tue May 24, 2016 6:47 pm

Re: What a difference a bit makes - A debugging journey

Post by czietz »

ThorstenOtto wrote: Sun Mar 24, 2024 12:45 pm Did you also try to run the instruction in super-visor? Atleast that should not give an bus-error.
It still does. It always tries to access 0x10 in CPU space (FC2=FC1=FC0=1), not in memory space.
User avatar
tOri
Captain Atari
Captain Atari
Posts: 246
Joined: Thu Jun 18, 2020 4:30 pm
Location: Poland
Contact:

Re: What a difference a bit makes - A debugging journey

Post by tOri »

Hi czietz!

Truly hardcore debugging :)
I like it a lot!

Regards
tOri
http://atari.myftp.org ATARI - Power without price and necessary elements
various varieties for Atari and not only - useful or not, but it's worth a look ...
https://reversing.pl/
ijor
Hardware Guru
Hardware Guru
Posts: 4639
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: What a difference a bit makes - A debugging journey

Post by ijor »

czietz wrote: Sat Mar 23, 2024 2:15 pm Today, I investigated why the PC/MS-DOS emulator SoftPC would not run on my TT, whereas it runs fine under Hatari's TT emulation. After ruling out the usual stuff (i.e., making sure the setup is exactly the same), I already started to suspect an instability in my TT. But I would be proven wrong! This is a very condensed writeup of my investigation.
Very interesting research, as always ...
As you can see, the second word of the offending instruction, 0x6900 is a "full format extension word", as its bit 8 is set. However, its "BD SIZE" field is "00", which is reserved according to the user manual. Heureka! This is an invalid instruction.
...
[*] I suspect that the real 68030 has a minor bug regarding this invalid extension word. It probably wants to fetch the illegal instruction exception vector (which happens to be at address 0x10), but "forgets" to switch to supervisor mode; as it would usually do when handling an exception. This is why I see a bus error at address 0x10, instead of an illegal instruction exception.
I'm not an expert on the 030 (or the 020 for that matter), but this doesn't sound very likely. In first place, IMHO, it wouldn't be just a minor bug, but a rather major one with possible security implications.

I'm not sure this is exactly an "invalid instruction" in the strict sense. According to the manual, illegal instructions are those with an invalid opcode in the first word. It doesn't mention anything about reserved (or invalid) bits in the extension word. I would say that the behavior in this case is undocumented.
The 68030 "bug" for the illegal extension word in the "3584 6900" instruction is even a bit weirder. The EmuTOS crash message also provides the "special status word" (SSW), with more information about the conditions under which the bus error occurred. The SSW is 0x070F in this case, which indicates FC2=FC1=FC0=1, i.e., a CPU space cycle, not even a memory access.
It sounds like the CPU might be trying to execute a breakpoint cycle. But we would need a logical analyzer trace to see what is really going on.
Fx Cast: Atari St cycle accurate fpga core
czietz
Hardware Guru
Hardware Guru
Posts: 2734
Joined: Tue May 24, 2016 6:47 pm

Re: What a difference a bit makes - A debugging journey

Post by czietz »

Although if it was a regular breakpoint cycle, terminating it with a bus error would invoke the illegal instruction exception, not the bus error exception. So, this is most probably undefined/undocumented behavior.
Screenshot_20240330_151849_Samsung Notes.png
You do not have the required permissions to view the files attached to this post.
ijor
Hardware Guru
Hardware Guru
Posts: 4639
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: What a difference a bit makes - A debugging journey

Post by ijor »

czietz wrote: Sat Mar 30, 2024 2:20 pm Although if it was a regular breakpoint cycle, terminating it with a bus error would invoke the illegal instruction exception, not the bus error exception. So, this is most probably undefined/undocumented behavior.
Yes, indeed, it doesn't seem to be an exact breakpoint cycle access. But, again, I think we would need a logic analyzer trace to see exactly what is going on.
Fx Cast: Atari St cycle accurate fpga core
Post Reply

Return to “Professionals”