I believe I have a working and empirically tested hypothesis consistent with theory as to how the wakestates arise

As always, knowledge does not appear out of thin air. Thanks to Dio for measurements and theory, to mc6809e for voicing the 32->8 clock boundaries as causal and to Paolo - as always - for the wakestate discovery, tests and documentation as found in his Excel sheet.
With that said;
Prerequisites:
1) As seen from the CPU, the GLUE - while also at 8MHz - can be offset 0-3 cycles due to lack of synchronization when powered on. This means that when we talk about "cycle 56" it can in reality mean cycle 56-59 for GLUE.
2) Both FREQ (ff820a) and RES (ff8260) are inside GLUE. While Shifter has a copy of ff8260, all modifications to these two registers causing sync changes are GLUE and GLUE only.
3) When GLUE checks the state of FREQ and RES, due to unknown implementation/wiring/signal-propagation reasons, RES checks are one cycle later than FREQ checks.
Result:
The above causes the following timing possibilities if we look at when GLUE really checks (as seen by the CPU) the interesting "cycle 56" position for 000-byte line:
FREQ 56, RES 57
FREQ 57, RES 58
FREQ 58, RES 59
FREQ 59, RES 60
Depending on whether FREQ == 50 and RES == LO when these checks are made, H (thus DE) is raised one cycle later (58-61).
Dio has documented that depending on wakestate, there's a lag from raised DE to MMU raising LOAD of 3-6 cycles. The MMU is not affected by the GLUE 0-3 offset, which means it looks like this:
MMU detects GLUE DE at cycle 62* and raises LOAD at cycle 64
64-58 = 6 = DL6
64-59 = 5 = DL5
64-60 = 4 = DL4
64-61 = 3 = DL3
If we for a second shift our attention to visible pixels on screen, we must remember that GLUE decides (through HSYNC) where the screen is physically placed by the monitor. If GLUE is "late" 3 cycles, as in the DL3 example above, the distance between the screen start and the LOADed pixels displayed by the Shifter will be shorter. We should thus see the screen being shifted one pixel per DL-state above - which is exactly what Dio has documented. DL3 leftmost, DL6 rightmost.
Alright, back to GLUE being offset 0-3 cycles compared to the CPU. There's no way for us to test or detect this from software, we simply don't have one-cycle resolution on the ST. We do however have two-cycle resolution thanks to the use of EXG before MOVE instructions. If we try to change the values of FREQ and RES with as much detail as possible, for GLUE to pick it up, it will look like this:
Changes made by CPU at FREQ 56, RES 56 - read by GLUE at FREQ 56, RES 57
Changes made by CPU at FREQ 56, RES 58 - read by GLUE at FREQ 57, RES 58
Changes made by CPU at FREQ 58, RES 58 - read by GLUE at FREQ 58, RES 59
Changes made by CPU at FREQ 58, RES 60 - read by GLUE at FREQ 59, RES 60
The above is not guesswork. Those are the exact values as documented by Paolo and I have posted them before, but like this:
WS1 (DL6): screen DE is FREQ 56, RES 56
WS3 (DL5): screen DE is FREQ 56, RES 58
WS4 (DL4): screen DE is FREQ 58, RES 58
WS2 (DL3): screen DE is FREQ 58, RES 60
(Verify with the Excel sheet - Default RES values are for WS3/4. In WS1 all RES state checks happen 2 cycles earlier, in WS2 2 cycles later. Default FREQ values are for WS1/3, in WS2/4 all FREQ state checks happen 2 cycles later. The above is the result for a specific position)
Conclusion: The known detectable wakestates are the result of GLUE being offset 0-3 cycles compared to the CPU - which fits with theory (unsynchronized initialisation at same clock) as well as observation (DE-to-LOAD, visible pixel position on screen) and empirical testing of when changes to FREQ and RES have to be made for GLUE to detect them.
Comments welcome

(There might be some uncertainty as to exactly at which cycle which signal is raised and detected but I don't believe it will change the conclusion)
/Troed
PS: All other known wakestates - Spectrum 512 dots, banded/non-banded etc are most likely in Shifter - which with its 32Mhz clock that it then divides further cause similar possibilities. It will also be "lagged" due to receiving DE directly from GLUE with the 0-3 cycle possible offset yet receive LOAD and data from MMU at a fixed cycle (as fix as it can be due to the different clocks, 16 and 32). That might be causal in why WS1/DL6 isn't possible to make unstable for 4-pixel scrolling - I'll get back to that when I've "finished" the Shifter state machine. edit: Oh, and yes, this should explain why there's a WS3 that sometimes behave as WS1 when it comes to unstabilization - the mapping between GLUE wakestates and Shifter isn't 1:1 although it seems to be highly influenced.
*) Why it doesn't detect DE at 58 and 60 when raised that early as well? I don't know.