Well, what bothering me is that "fully working" . That's very misleading formulation. Man just think that it is flawless, and can play whole game without problems. Much better would be to put there only "working" or something in that meaning for all cases where it is not fully tested (to stay at that word).
And there still stays same, despite that yourself tested it. I don't get how you don't see contradiction there.
But this is actually easy case, as said already, since you can reach zone 5 directly, without longer play(test) .What is with thousands of games where no start level selection ? And here starts my longer blah:
Yes, you are perfectly right about that testing thousands of floppy images is enormous work. I talked about it here several times that only with some coordinated testing is possible to well, thoroughly test all it.
Myself found many errors, even in originals, not because I play so much, but because hard disk adaptations. I got too lot of error reports of Atari people.
I even wanted to start here section about bad cracks, bad releases, but some just did not like it. There is really bad attitude by certain people about that: for instance I wrote some error reports to D-Bug forum, about their cracks. And what they did ? Deleted posts, banned me, called troll ... Instead just fixing error, or adding note that it has that and that error. And I know for sure that there were reports from other people too - like for Helter Skelter. This was just one example, crew which was active still some years ago. But most of those, who did cracks, menu disks is not accessible. Some are active here, indeed, but I don't see that they care much about quality info of their releases, to call it so.
Maybe I am more sensitive about all it, since I spend lot of time dealing with it. For instance I wasted some 5-6 hours with that bad Zone Warrior crack, while they would need max 5 minutes to check are all 5 zones work. And it is not only that 5 files are shorter than originals, but there is some mess about file loading order in code, errors in RAMdisk code. So, I needed to do almost everything again with Replicants release - which has same number of files, but pretty much different code for loading them. All it would be faster and less troublesome and more reliable with some image of original.
I recommend that look here:
http://forum.8bitchip.info/preservation-error-reports/'bad'-games-listThere is almost all what I found in last 10 years. Some people contributed too.
What I can say is that it is not rare case that game works fine until very late, final levels, and there is a crash, error. The usual cause is extra protection. Like in case of Wrath of the Demon - it has plenty of checksums at all levels, but final level load and check is total different. So, only way to make it right is to reach that level and test there. Indeed, that's much easier today with some Debugger emulator than in old times with real HW. It means that we can now do better cracks, hard disk adaptations in less time than 20-30 years ago. And I can say that I did a lot. Over 1000 games for hard disk. Lot of diverse floppy release fixes - and there are many extra fixes too for better TOS version compatibility and like.
When I proposed here some database for Atari ST SW, what would contain quality info for all diverse releases of some title, usual critic was that I want it in purpose to advertise my site with hard disk adaptations and like. Now, imagine how much it would help your concrete site.
Famous Schrodinger's cat hypothetical experiment says that cat is dead or alive until we open box and see condition of poor animal, which deserved better logic. Cat is always in some certain state - regardless from is observer able or not to see what the state is.