I don't know this compiler but here is my guess.
Selecting a CPU type in any kind of compiler results in a bunch of assumptions about the machine - some of which will be wrong. On the Falcon the biggest 'wrong assumption' would be a 32bit data bus - the compiler may start rendering code in terms of 32bit longs instead of 16bit shorts, expecting alignment-oriented gains and instead getting a horrible slowdown. You could probably test this scenario by running the same comparison on a TT (or any 030 board with local fastram).
The 68030 also behaves differently from the 68020 - it has a data cache, and this can occasionally get in the way if abused - particularly on the Falcon's data bus (the data cache really doesn't like that 16bit wide fetching!). Harder to differentiate this one but could be tested vs a TT with cache on/off and compare the ratios...
And then there's the compiler itself - it may assume using 020+ addressing modes and bitfield ops (etc.) are faster than simpler 68000 ops, but this often isn't the case either (This part is at least machine-independent so it should be possible to get that right most of the time). Can only differentiate this one as 'whatever remains' from the other tests, if possible at all.
In summary, CPU optimizations sometimes aren't
They can and should be, but I've seen some of the best compilers get that stuff wrong on a regular basis. I wouldn't be surprised if Omikron Basic has similar problems.
Even if you identify what the problem is, I doubt you can do much to control it except turn the 68020 flag on or off
The main value with 020+ compile modes is hardware floating point since 881'2 ops only interface directly with 020+ upwards.