The worst hack in Steem

A place to discuss current and future developments for STeem

Moderators: Mug UK, Steem Authors, Moderator Team

The worst hack in Steem

Postby Steven Seagal » Thu Dec 15, 2011 9:00 pm

Sorry if this post is a bit obscure, it's for reference. I'm trying to understand the main loop of Steem. I think it is both clever and dangerous.

Steem Authors wrote:The M68K runs at 8MHz. CPU timings are measured by two int variables. cpu_cycles stores the number of cycles remaining until the next event (see next section). cpu_timer keeps a running total of the number of cycles executed at the next event. The number of cycles since the emulator started is therefore given by:

#define ABSOLUTE_CPU_TIME (cpu_timer-cpu_cycles)

Because this is an int value, it loops once every 8.5 minutes (approximately). All code takes this into account.


The worst hack in Steem is not from me! They don't joke about it, cpu_timer, which is an int (signed 32bit), does loop, and becomes negative halfway through(4min+). Somehow, magically, the program all holds together (something-cpu_timer_at_such_time still works with negative values), but in C/C++, this is called overflow when you "loop" or "go negative" by adding too much to an int, and it results in undefined behaviour according to the standard,which means that's no good practice at all. I guess it depends on the processor, and I'm not really interested in examining the bit representation in Intel.
I think there could also be bugs caused by this technique in border removal detection when cpu_timer is at the threshold, because the < & > operators don't work like the - operator. I know I had a bug at 4min + in Nostalgia/Lemmings because of this.
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby Hippy Dave » Thu Dec 15, 2011 10:15 pm

Steven Seagal wrote:Sorry if this post is a bit obscure, it's for reference. I'm trying to understand the main loop of Steem. I think it is both clever and dangerous.

Steem Authors wrote:The M68K runs at 8MHz. CPU timings are measured by two int variables. cpu_cycles stores the number of cycles remaining until the next event (see next section). cpu_timer keeps a running total of the number of cycles executed at the next event. The number of cycles since the emulator started is therefore given by:

#define ABSOLUTE_CPU_TIME (cpu_timer-cpu_cycles)

Because this is an int value, it loops once every 8.5 minutes (approximately). All code takes this into account.


The worst hack in Steem is not from me! They don't joke about it, cpu_timer, which is an int (signed 32bit), does loop, and becomes negative halfway through(4min+). Somehow, magically, the program all holds together (something-cpu_timer_at_such_time still works with negative values), but in C/C++, this is called overflow when you "loop" or "go negative" by adding too much to an int, and it results in undefined behaviour according to the standard,which means that's no good practice at all. I guess it depends on the processor, and I'm not really interested in examining the bit representation in Intel.
I think there could also be bugs caused by this technique in border removal detection when cpu_timer is at the threshold, because the < & > operators don't work like the - operator. I know I had a bug at 4min + in Nostalgia/Lemmings because of this.


This method of using timers is well known and common in hardware and software. Using unsigned integers would be a better choice however. Note that as long as "cpu_timer - cpu_cycles" doesn't become negative (0 to +2147483647 only) things will work fine. This technique assumes twos-complement integer math. Are there any computers that don't support twos-complement ?
User avatar
Hippy Dave
Captain Atari
Captain Atari
 
Posts: 448
Joined: Sat Jan 10, 2009 5:40 am

Re: The worst hack in Steem

Postby Steven Seagal » Thu Dec 15, 2011 10:26 pm

Well I knew nothing about that, that's new to me. All I can guess is that it makes assumptions about the hardware it's run on.
Up to now I could live without investigating damn "2's complement", I would like so much to keep it that way.
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby Dio » Thu Dec 15, 2011 11:00 pm

Steven Seagal wrote:when you "loop" or "go negative" by adding too much to an int, and it results in undefined behaviour according to the standard

It's officially undefined. However, there is no modern CPU which implements it by any way other than wrapping from max int to max negative int (and by 'modern' I mean "that I've ever programmed" and I've programmed at least a dozen asms in my time).

You are right, though, that the comparison operators are a lot more sensitive to this stuff - this is why MSVC warns on comparing signed and unsigned quantities, for example.
Dio
Captain Atari
Captain Atari
 
Posts: 446
Joined: Thu Feb 28, 2008 3:51 pm

Re: The worst hack in Steem

Postby DrCoolZic » Thu Dec 15, 2011 11:18 pm

I have also noticed in the source that the notion of signed unsigned is not always used properly.
When compiling with reasonable warning level you will find a lot of warnings related to that. For example you have a lot of wrong initialization char x = 200;

To get back to your point the C/C++ is clear about signed overflow: undefined behavior

for example with something like
Code: Select all
signed int i = 2147483648;
i++;

The results is not predictable (vary from architechture to another) and the language washes it's hands on the issue: it doesn't care!

If you know your architechture and your compiler, what instruction sequences it generates, you are not invoking undefined behaviour. Behaviour of adding two signed integers is WELL DEFINED operation in IA32 assembly language. Now all you need to know what your compiler does and you'reall set! It's PERFECTLY LEGAL! But when you look at it from pure c++ standards point of view, CRASHING IS ALSO PERFECTLY LEGAL!

Actually, the overflow behavior is always undefined, in theory even for unsigned integers. But that could never happen because the standard says that unsigned integers don't overflow at all. The wrap-around is just part of the normal unsigned integer behavior and not seen as overflow: (2^n here of course means 2 raised to the power of n) - look at Subclause 3.9.1, paragraph 4:

And by the way yes I am not aware of an architecture not using 2's complement.
You do not have the required permissions to view the files attached to this post.
User avatar
DrCoolZic
Atari God
Atari God
 
Posts: 1401
Joined: Mon Oct 03, 2005 7:03 pm
Location: France

Re: The worst hack in Steem

Postby DrCoolZic » Fri Dec 16, 2011 1:33 pm

I have done some test on Windows using VC++ 2010
I have used the following code
Code: Select all
int main() {
   signed int i, j, k;
   bool ij, jk;
   
   i = 2147483648;
   i++;
   j = i - 1;
   k = i - 2;
   ij = i > j;
   jk = j > k;
   fprintf(stdout, "%d %d %d *** %u %u %u *** %d %d\n", i, j, k, i, j, k, ij, jk);
}

Guess the answer
Code: Select all
-2147483647 -2147483648 2147483647 *** 2147483649 2147483648 2147483647 *** 1 0

We have i++ that causes an overflow resulting in a negative number, Then i-1 gives the min negative number, Then i-2 gives an underflow resulting in a positive number
This is if you print the numbers as signed int (interpreted as two complement) but if you print them as unsigned int then it is "easier to follow"
The last two printed values are the comparison and you are getting i > j but j < k which make sense but probably not what you expect.

Lets look at the assembly (not optimized)
Code: Select all
; 4    :    signed int i, j, k;
; 5    :    bool ij, jk;
; 6    :    
; 7    :    i = 2147483648;
   mov   DWORD PTR _i$[ebp], -2147483648      ; 80000000H
; 8    :    i++;
   mov   eax, DWORD PTR _i$[ebp]
   add   eax, 1
   mov   DWORD PTR _i$[ebp], eax
; 9    :    j = i - 1;
   mov   eax, DWORD PTR _i$[ebp]
   sub   eax, 1
   mov   DWORD PTR _j$[ebp], eax
; 10   :    k = i - 2;
   mov   eax, DWORD PTR _i$[ebp]
   sub   eax, 2
   mov   DWORD PTR _k$[ebp], eax
; 11   :    ij = i > j;
   mov   eax, DWORD PTR _i$[ebp]
   xor   ecx, ecx
   cmp   eax, DWORD PTR _j$[ebp]
   setg   cl
   mov   BYTE PTR _ij$[ebp], cl
; 12   :    jk = j >k;
   mov   eax, DWORD PTR _j$[ebp]
   xor   ecx, ecx
   cmp   eax, DWORD PTR _k$[ebp]
   setg   cl
   mov   BYTE PTR _jk$[ebp], cl

OK so now what would happen if we replace the declaration of i,j,k by unsigned?
Result is
Code: Select all
-2147483647 -2147483648 2147483647 *** 2147483649 2147483648 2147483647 *** 1 1

This is exactly the same result with the exception of the compare. And in that case it is more intuitive i>j and j>k
If we look at the assembly
Code: Select all
; 4    :    unsigned int i, j, k;
; 5    :    bool ij, jk;
; 6    :    
; 7    :    i = 2147483648;
   mov   DWORD PTR _i$[ebp], -2147483648      ; 80000000H
; 8    :    i++;
   mov   eax, DWORD PTR _i$[ebp]
   add   eax, 1
   mov   DWORD PTR _i$[ebp], eax
; 9    :    j = i - 1;
   mov   eax, DWORD PTR _i$[ebp]
   sub   eax, 1
   mov   DWORD PTR _j$[ebp], eax
; 10   :    k = i - 2;
   mov   eax, DWORD PTR _i$[ebp]
   sub   eax, 2
   mov   DWORD PTR _k$[ebp], eax
; 11   :    ij = i > j;
   mov   eax, DWORD PTR _i$[ebp]
   cmp   DWORD PTR _j$[ebp], eax
   sbb   ecx, ecx
   neg   ecx
   mov   BYTE PTR _ij$[ebp], cl
; 12   :    jk = j > k;
   mov   eax, DWORD PTR _j$[ebp]
   cmp   DWORD PTR _k$[ebp], eax
   sbb   ecx, ecx
   neg   ecx
   mov   BYTE PTR _jk$[ebp], cl

We can see that all operations on the number are exactly the same whether signed or unsigned. However the comparison are different because now the compile has to take care of the sign.

The above result are not a surprise, but they are somewhat specific to VC++ and of course to Intel asm.

So back to your original post doing operations on signed or unsigned should probably gives same results but must be treated carefully. For example in cpu_timer-cpu_cycles I assume that cpu_timer is always bigger (from an "unsigned point of view") than cpu_cycles otherwise you may be in trouble.
However if you try to compare the cpu_timer-cpu_cycles value with other values you will have problems almost for sure.

Bottom line cpu_timer and cpu_cycles (in steemh.h) should probably be converted to unsigned int.
But as I already said this seems to be a recurrent problem in Steem for example to avoid C4309 warnings (truncation) in steemh.h I had to declare
Code: Select all
unsigned short m68k_src_w;   // jlg_4309
unsigned long m68k_src_l;   // jlg_4309
unsigned char m68k_src_b;   // jlg_4309

...
User avatar
DrCoolZic
Atari God
Atari God
 
Posts: 1401
Joined: Mon Oct 03, 2005 7:03 pm
Location: France

Re: The worst hack in Steem

Postby Dio » Fri Dec 16, 2011 4:43 pm

Thinking about it I'm not certain it would be entirely possible to emulate a 68000 on a platform that didn't follow the typical conventions, because the 68000 does not distinguish between signed and unsigned quantities. At the very least it would be hard work and slow - you would probably have to use 64-bit arithmetic and promote everything to that.

So the assumption of a standard platform is implicit at many more levels than you might think.
Dio
Captain Atari
Captain Atari
 
Posts: 446
Joined: Thu Feb 28, 2008 3:51 pm

Re: The worst hack in Steem

Postby DrCoolZic » Fri Dec 16, 2011 5:11 pm

Dio wrote:Thinking about it I'm not certain it would be entirely possible to emulate a 68000 on a platform that didn't follow the typical conventions, because the 68000 does not distinguish between signed and unsigned quantities. At the very least it would be hard work and slow - you would probably have to use 64-bit arithmetic and promote everything to that.

So the assumption of a standard platform is implicit at many more levels than you might think.

I do not think that an emulator engine is related (or should be related) at all to the platform running it ?
User avatar
DrCoolZic
Atari God
Atari God
 
Posts: 1401
Joined: Mon Oct 03, 2005 7:03 pm
Location: France

Re: The worst hack in Steem

Postby Dio » Fri Dec 16, 2011 5:29 pm

Well, if you want to rewrite it, at 1/3 the performance, knock yourself out :D . I'd happily lay money that there is no 68000 core out there that will work on a non-2's complement CPU.
Dio
Captain Atari
Captain Atari
 
Posts: 446
Joined: Thu Feb 28, 2008 3:51 pm

Re: The worst hack in Steem

Postby DrCoolZic » Fri Dec 16, 2011 6:49 pm

would be difficult to find a non -2's complement CPU anyway ;)

I think I now understand that you are talking about performance of the emulator ?
User avatar
DrCoolZic
Atari God
Atari God
 
Posts: 1401
Joined: Mon Oct 03, 2005 7:03 pm
Location: France

Re: The worst hack in Steem

Postby Dio » Fri Dec 16, 2011 7:10 pm

Yes.

You could always write an emulator for a non-2's complement CPU and write an ST emulator on it. :D
Dio
Captain Atari
Captain Atari
 
Posts: 446
Joined: Thu Feb 28, 2008 3:51 pm

Re: The worst hack in Steem

Postby Steem Authors » Wed Dec 28, 2011 9:50 pm

Actually we didn't know that int32 wrap-around behaviour is undefined in C++. Too much programming Assmbler :)

We've just always treated the cpu_timer as a wrapping counter. The absolute value of the counter is meaningless. It just allows us to count the number of cycles from one event to another. By subtracting one value of cpu_counter from an earlier one you get the interval which is a small positive integer value.

It wraps every 8.5 minutes or so according to the normal rules of binary arithmetic, which means it wraps nicely from MAX_INTEGER to MIN_INTEGER.

I don't agree this is the worst hack in Steem... there's much worse in there :-)
User avatar
Steem Authors
Steem Developer
Steem Developer
 
Posts: 534
Joined: Tue Apr 30, 2002 10:34 pm
Location: UK

Re: The worst hack in Steem

Postby DrCoolZic » Thu Dec 29, 2011 2:27 pm

Actually signed or unsigned int would give the same results (otherwise the program wont work) as long as you do arithmetic operations. But result would be different if used in compare
User avatar
DrCoolZic
Atari God
Atari God
 
Posts: 1401
Joined: Mon Oct 03, 2005 7:03 pm
Location: France

Re: The worst hack in Steem

Postby Steven Seagal » Mon Jan 02, 2012 7:53 pm

Steem Authors wrote:I don't agree this is the worst hack in Steem... there's much worse in there :-)

Gulp!
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby Steven Seagal » Sat Apr 07, 2012 7:57 am

Steem Authors wrote:I don't agree this is the worst hack in Steem... there's much worse in there :-)


Well, I found it, a worse hack, something horrible, so big I couldn't believe it's true, and it will be very hard to "fix", if even it's something to do.
Steem organises confusion around scanline starting cycles by returning wrong values when programs read SDP (video counter), and fools programs into taking the HBL timing (444 cycles into a scanline in Steem) as the first cycle of the scanline (512=0), that is 68 cycles too soon vs. a true ST.
How precisely it happens I don't know because my grasp of M68000 assembly is too limited. Obviously, programs read SDP and use the value to synchronise shifter events.

To illustrate, I took the Amiga Demo. First I show that Steem & Hatari return different values for SDP.
Then I show the cycles for right border removal, in Hatari, in Steem, and in Steem "fixed" using Hatari's SDP. I think the proof of what I say is the 'framecycles' value. In Steem, it is 68 cycles smaller each time: the difference between 444 and 512.

Code: Select all
TEX Amiga Demo overscan

Reading SDP
VBL 162 HblCounterVideo 69 Cycles since HBL 76 Read SDP Steem 403C4 Hatari 403C0
VBL 162 HblCounterVideo 69 Cycles since HBL 96 Read SDP Steem 403CE Hatari 403C0
VBL 162 HblCounterVideo 69 Cycles since HBL 116 Read SDP Steem 403D8 Hatari 403C0
VBL 162 HblCounterVideo 69 Cycles since HBL 136 Read SDP Steem 403E2 Hatari 403C0
VBL 162 HblCounterVideo 69 Cycles since HBL 156 Read SDP Steem 403EC Hatari 403CA
VBL 162 HblCounterVideo 69 Cycles since HBL 176 Read SDP Steem 403F6 Hatari 403D4
VBL 162 HblCounterVideo 69 Cycles since HBL 196 Read SDP Steem 40400 Hatari 403DE
VBL 162 HblCounterVideo 69 Cycles since HBL 216 Read SDP Steem 4040A Hatari 403E8
VBL 162 HblCounterVideo 69 Cycles since HBL 236 Read SDP Steem 40414 Hatari 403F2
VBL 162 HblCounterVideo 69 Cycles since HBL 256 Read SDP Steem 4041E Hatari 403FC
VBL 162 HblCounterVideo 69 Cycles since HBL 276 Read SDP Steem 40428 Hatari 40406

Right border removing, one random scanline

Hatari:

VBL 1842 HblCounterVideo 163 FrameCycles 83832  LineCycles 376
VBL 1843 HblCounterVideo 163 FrameCycles 83832  LineCycles 376
VBL 1844 HblCounterVideo 163 FrameCycles 83832  LineCycles 376
VBL 1845 HblCounterVideo 163 FrameCycles 83832  LineCycles 376
VBL 1846 HblCounterVideo 163 FrameCycles 83832  LineCycles 376
VBL 1847 HblCounterVideo 163 FrameCycles 83832  LineCycles 376

Steem:

VBL 95 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308
VBL 96 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308
VBL 97 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308
VBL 98 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308
VBL 99 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308
VBL 100 HblCounterVideo 163 FrameCycles 83764  LineCycles 376 Since HBL 376 Since Scanline 308

Steem, using Hatari's Video_CalculateAddress():

VBL 40 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 41 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 42 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 43 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 44 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 45 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376
VBL 46 HblCounterVideo 163 FrameCycles 83832  LineCycles 376 Since HBL 444 Since Scanline 376


You see what happens? Steem makes the program hit at cycle 308 instead of 376, and gleefully interprets this as 'right off' shifter trick.
Damn! It seems Steem 3.4 is not for tomorrow.
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby npomarede » Sat Apr 07, 2012 9:44 am

Hello

In my experience Steem timings for border removal are good, next HBL is on cycle 512, not 444 (when running at 50 Hz) and right border removal can only happen at cycle 376. Are you sure the problem is not in your traces, if Steem detected border removal 68 cycles too soon, nothing would work at all.
What is the origin of your "since scanline" in Steem ; obviously this is not the start of the line, but more something like "since display enable signal is on", which is much later. The best way to count cycles is by taking "since hbl", that is (more or less) when the electron beam is on the left of the screen, just starting a new line (cycle 512 of previous hbl = cycle 0 of current hbl).

If you use "since hbl", then all traces have the same 376 value, and this is coherent with the fact that right border is removed, so I don't see any problem.

Nicolas
User avatar
npomarede
Atari Super Hero
Atari Super Hero
 
Posts: 695
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby Steven Seagal » Sun Apr 08, 2012 11:51 am

Of course most shifter tricks work in Steem, but it's thanks to the hbl (which IS set at 444, trust me)/scanline hack, check the 'framecycles' in my example above.

For example the amazing SoWatt/Sync demo works because the VBL routine is called right at the start of the frame, and not 60 or so cycles in the frame as on a true ST (VblVideoCycleOffset in Hatari).

Everything holds together, that's why it's hard to "fix", but it should be done for real cycle accuracy!
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby npomarede » Sun Apr 08, 2012 12:31 pm

No, I really think VBL is occuring at the correct time in Steem too, else many top border removals that rely on a 4-8 max cycles jitter would not work so far.
Maybe some particular demos have some specific interrupts where the VBL interrupt was delayed for more than one VBL or an HBL was "blocked" by the interrupt mask in SR and will happen immediatly because it was in "pending" state, instead of occurring at the usual position.
There're hundred of demos that rely on this, basing your conclusion on a few that don't work is not good imho.

You can try to delay the VBL by 60 cycles if you like, and you will see that barely no overscan demos will work anymore. So Steem's timings for vbl are good in all cases. The problem you saw might be due to the combinations of some simultaneous interrupts maybe.

What I mean is that whatever Steem's code is using as a formula to compute shifter cycles, it manages to get the work done in maybe 98 % of the cases. So perhaps the code in steem is not as simple as it could be, but I really think it's correct, there should be other problems with the demos you refer to (and the fact that these demos sometimes needed a patch is an indication for me that it must be a very specific case, else Steem's author would not have left such obvious errors)

Nicolas
User avatar
npomarede
Atari Super Hero
Atari Super Hero
 
Posts: 695
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby Steven Seagal » Sun Apr 08, 2012 3:59 pm

The VBL is called at once at the start of the frame, but scanlines start earlier than they should. The difference in cycles is about the same, so it's compensated. I'm sure other events are adjusted as well. I experimented with SoWatt/Sync. The VBL starts MFP timers that trigger somewhere in the scanline and read the SDP to adjust shifter tricks. If it triggers 60 cycles off, it fails.

The flicker bug in "Cool STE" is a consequence of the hack I describe. I can be fixed with other harmless hacks. I found other quirks as well.

Of course, for most cases, shifting display timing by 60 cycles on 160256 is 0,0375%, it won't break everything.

Maybe I'm not clear enough, but I don't claim programs don't work. I claim events are triggered 60+ cycles earlier than normal. Maybe there's a good reason for that, namely the stability of the hbl in Steem, so I'm not sure yet what to do.

This is the event planner in Steem, where CYCLES_FOR_VERTICAL_RETURN_IN_50HZ is 444 (draw.cpp).
evp->event=event_hbl becomes the reference to compute scanline cycles (eg 372, 376, etc.).

Code: Select all
bool draw_routines_init()
{
  {
    event_plan[0]=event_plan_50hz;
    event_plan[1]=event_plan_60hz;
    event_plan[2]=event_plan_70hz;
    event_plan_boosted[0]=event_plan_boosted_50hz;
    event_plan_boosted[1]=event_plan_boosted_60hz;
    event_plan_boosted[2]=event_plan_boosted_70hz;

    /* 50Hz:
      VBLs spaced by 160256 cycles (49.92Hz)

      444 cycles:  HBL interrupt
      444+n*512, n=1..312:  draw scanline to end, HBL interrupt
      160256:  VBL
    */
    screen_event_struct *evp=event_plan_50hz;
    evp->time=CYCLES_FOR_VERTICAL_RETURN_IN_50HZ;
    evp->event=event_hbl;
    evp++;
    for (int y=1;y<313;y++){
      evp->time=CYCLES_FOR_VERTICAL_RETURN_IN_50HZ + y*512;
      evp->event=event_scanline;
      evp++;
      if ((CYCLES_FOR_VERTICAL_RETURN_IN_50HZ+y*512) <= (160256-CYCLES_FROM_START_VBL_TO_INTERRUPT) &&
          (CYCLES_FOR_VERTICAL_RETURN_IN_50HZ+y*512+512) > (160256-CYCLES_FROM_START_VBL_TO_INTERRUPT)){
        evp->time=160256-CYCLES_FROM_START_VBL_TO_INTERRUPT;
        evp->event=event_start_vbl;
        evp++;
      }
    }
    evp->time=160256;
    evp->event=event_vbl_interrupt;
    evp++;
    evp->event=NULL;

    /* 60Hz:
      VBLs spaced by 133604 cycles (59.87Hz)

      444 cycles:  HBL interrupt
      444+n*508, n=1..262:  draw scanline to end, HBL interrupt
      133604:  VBL
    */
    evp=event_plan_60hz;
    evp->time=CYCLES_FOR_VERTICAL_RETURN_IN_60HZ;
    evp->event=event_hbl;
    evp++;
    for(int y=1;y<263;y++){
      evp->time=CYCLES_FOR_VERTICAL_RETURN_IN_60HZ + y*508;
      evp->event=event_scanline;
      evp++;
      if ((CYCLES_FOR_VERTICAL_RETURN_IN_60HZ+y*508) <= (133604-CYCLES_FROM_START_VBL_TO_INTERRUPT) &&
          (CYCLES_FOR_VERTICAL_RETURN_IN_60HZ+y*508+508) > (133604-CYCLES_FROM_START_VBL_TO_INTERRUPT)){
        evp->time=133604-CYCLES_FROM_START_VBL_TO_INTERRUPT;
        evp->event=event_start_vbl;
        evp++;
      }
    }
    evp->time=133604;
    evp->event=event_vbl_interrupt;
    evp++;
    evp->event=NULL;

    /* 60Hz:
      VBLs spaced by 112000 cycles (71.36Hz)

      200 cycles:  HBL interrupt
      200+n*224, n=1..452:  draw scanline to end, HBL interrupt
      112000:  VBL
    */
    evp=event_plan_70hz;
    for (int y=0;y < (SCANLINES_ABOVE_SCREEN_70HZ+400+SCANLINES_BELOW_SCREEN_70HZ);y++){
      evp->time=CYCLES_FOR_VERTICAL_RETURN_IN_70HZ + y*SCANLINE_TIME_IN_CPU_CYCLES_70HZ;
      evp->event=event_scanline;
      evp++;
      if ((CYCLES_FOR_VERTICAL_RETURN_IN_70HZ + y*SCANLINE_TIME_IN_CPU_CYCLES_70HZ) <= (112224-CYCLES_FROM_START_VBL_TO_INTERRUPT) &&
          (CYCLES_FOR_VERTICAL_RETURN_IN_70HZ + y*SCANLINE_TIME_IN_CPU_CYCLES_70HZ+SCANLINE_TIME_IN_CPU_CYCLES_70HZ)
             > (112224-CYCLES_FROM_START_VBL_TO_INTERRUPT)){
        evp->time=112224-CYCLES_FROM_START_VBL_TO_INTERRUPT;
        evp->event=event_start_vbl;
        evp++;
      }
    }
    evp->time=112224;
    evp->event=event_vbl_interrupt;
    evp++;
    evp->event=NULL;
  }

  PCpal=Get_PCpal();

  for(int a=0;a<3;a++)for(int b=0;b<4;b++)for(int c=0;c<3;c++)
    jump_draw_scanline[a][b][c]=draw_scanline_dont;

  // [0=Smallest size possible, 1=640x400 (all reses), 2=640x200 (med/low res)]
  //  [BytesPerPixel-1]
  //    [screen_res]

  jump_draw_scanline[0][0][0]=draw_scanline_8_lowres_pixelwise;
  jump_draw_scanline[0][0][1]=draw_scanline_8_medres_pixelwise;
  jump_draw_scanline[0][0][2]=draw_scanline_8_hires;
  jump_draw_scanline[0][1][0]=draw_scanline_16_lowres_pixelwise;
  jump_draw_scanline[0][1][1]=draw_scanline_16_medres_pixelwise;
  jump_draw_scanline[0][1][2]=draw_scanline_16_hires;
  jump_draw_scanline[0][2][0]=draw_scanline_24_lowres_pixelwise;
  jump_draw_scanline[0][2][1]=draw_scanline_24_medres_pixelwise;
  jump_draw_scanline[0][2][2]=draw_scanline_24_hires;
  jump_draw_scanline[0][3][0]=draw_scanline_32_lowres_pixelwise;
  jump_draw_scanline[0][3][1]=draw_scanline_32_medres_pixelwise;
  jump_draw_scanline[0][3][2]=draw_scanline_32_hires;

  jump_draw_scanline[1][0][0]=draw_scanline_8_lowres_pixelwise_400;
  jump_draw_scanline[1][0][1]=draw_scanline_8_medres_pixelwise_400;
  jump_draw_scanline[1][0][2]=draw_scanline_8_hires;
  jump_draw_scanline[1][1][0]=draw_scanline_16_lowres_pixelwise_400;
  jump_draw_scanline[1][1][1]=draw_scanline_16_medres_pixelwise_400;
  jump_draw_scanline[1][1][2]=draw_scanline_16_hires;
  jump_draw_scanline[1][2][0]=draw_scanline_24_lowres_pixelwise_400;
  jump_draw_scanline[1][2][1]=draw_scanline_24_medres_pixelwise_400;
  jump_draw_scanline[1][2][2]=draw_scanline_24_hires;
  jump_draw_scanline[1][3][0]=draw_scanline_32_lowres_pixelwise_400;
  jump_draw_scanline[1][3][1]=draw_scanline_32_medres_pixelwise_400;
  jump_draw_scanline[1][3][2]=draw_scanline_32_hires;

  jump_draw_scanline[2][0][0]=draw_scanline_8_lowres_pixelwise_dw;
  jump_draw_scanline[2][0][1]=draw_scanline_8_medres_pixelwise;
  jump_draw_scanline[2][0][2]=draw_scanline_8_hires;
  jump_draw_scanline[2][1][0]=draw_scanline_16_lowres_pixelwise_dw;
  jump_draw_scanline[2][1][1]=draw_scanline_16_medres_pixelwise;
  jump_draw_scanline[2][1][2]=draw_scanline_16_hires;
  jump_draw_scanline[2][2][0]=draw_scanline_24_lowres_pixelwise_dw;
  jump_draw_scanline[2][2][1]=draw_scanline_24_medres_pixelwise;
  jump_draw_scanline[2][2][2]=draw_scanline_24_hires;
  jump_draw_scanline[2][3][0]=draw_scanline_32_lowres_pixelwise_dw;
  jump_draw_scanline[2][3][1]=draw_scanline_32_medres_pixelwise;
  jump_draw_scanline[2][3][2]=draw_scanline_32_hires;

  draw_scanline=draw_scanline_dont;
//  palette_convert_entry=palette_convert_16_565;

  osd_routines_init();

  return true;
}

In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby Steven Seagal » Sun Apr 08, 2012 5:05 pm

And to illustrate that it's mainly for "inner working"(and because we like this demo):

Original

Image

Scanline, SDP & VBL cycles fixed:

Image

(edit: the fixed version was worse, I rectified it)
You shouldn't spot any difference except in my timing for hitting pause.

Edit: in fact I'm investigating this problem because I fixed the 3615 Gen 4 demo with a torn scroller, but noticed a glitch in the left border. Right now, the exact same glitch is present with more "correct" cycles, so it was a false lead or they're still not "correct". Not sure I will go on in this way.
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby npomarede » Sun Apr 08, 2012 9:12 pm

Steven Seagal wrote:Of course, for most cases, shifting display timing by 60 cycles on 160256 is 0,0375%, it won't break everything.

The fact that it would break even one program would be enough to tell the emulation is not correct. I don't have the list here, but even changing VBL by 4 cycles could completely break a few demos. So on the contrary, the fact that they work today in Steem leads me to believe that even if Steem uses another way to count cycles (see below), it gets the good result in the end and the inner logic should be correct.
Maybe I'm not clear enough, but I don't claim programs don't work. I claim events are triggered 60+ cycles earlier than normal. Maybe there's a good reason for that, namely the stability of the hbl in Steem, so I'm not sure yet what to do.
This is the event planner in Steem, where CYCLES_FOR_VERTICAL_RETURN_IN_50HZ is 444 (draw.cpp).

From the steem's code, this is in fact similar to the way Hatari counts hbls :
- in Steem, hbl are relative to the the start of the VBL. First one at cycle 444
- in Hatari, hbl (and vbl) are relative to the cycle "0" where the electron beam would be at the top left corner of the screen. So first HBL is at cycle 512.
So we could say Hatari's timing are using absolute coordinates instead of relative ones.

But as the VBL is in fact starting at cycle 64 (absolute), in the end both Steem and Hatari have a 1st VBL at cycle 512 (absolute).
So I would not say events are triggered 60 cycles earlier than normal, they're just counted in a different relative system than the absolute one used in Hatari.
In Steem if all computations are made relatively to the start of the VBL, then shifter's tricks will work, which is the case.

What you call cycle 444 is not absolute cycle 444, it's cycle 444 relative to the VBL's position (64 cycles or so depending on jitter), so it's the equivalent of Hatari's absolute 512 cycles.

Maybe Steem is missing some cases for a few rare overscan effects, but its timings and general architecture are correct, they just use a different origin for counting video events.

Nicolas

Nicolas
User avatar
npomarede
Atari Super Hero
Atari Super Hero
 
Posts: 695
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby Steven Seagal » Mon Apr 09, 2012 10:10 am

I still think that the cycles are forward in the frame, but it doesn't seem to matter, since all the rest is forward too. The VBL starts at once, first scanline starts sooner, everything is in sync. I tried to shift it back, but it's too great a task.
The only thing I'll keep from this, I think, is that I will use the Hatari routine to read the SDP (Video_CalculateAddress) from now on, I like to hack!
Like I said, it was part of efforts to fix some demo: http://www.pouet.net/prod.php?which=26986.
It writes the scanline being fetched! But in Steem at least, it's just after the pointer passed (upper part of scroller), and it starts at cycle 88, that is, well into the line. That's why I thought maybe the demo meant hbl (444) + 88, before display starts, but now I realise it's impossible or the shifter tricks (L&R border) would be off.

Image

I was annoyed by the glitch because when you pause the youtube video, it doesn't show.
http://www.youtube.com/watch?v=nA3IIWCP1eM
But now, watching it going on, it seems to me there's some break at the border too (the more I look the more I see it), only when you stop Steem it renders what's currently in video RAM. Sounds logical, or wishful thinking?
In other words, I'm afraid I've been wasting time on this!
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby npomarede » Mon Apr 09, 2012 10:44 am

Yes, the GEN4 demo is a very special one to emulate. I already had a look at it some times ago and even gunstick from ULM joined the discussion (there must be a thread about it on atari-forum, can't remember where).

The fact is that to save memory, this demo is using single buffer, which is completly correct when your code is running in synch with the video. As soon as some pixels were displayed, you're free to write immediatly to the video space to prepare the next image to be displayed.

In the case of Hatari, we render one full line at a time, so when the demo expects the line was displayed on real hardware to change it, in the case of Hatari the line was still not displayed, so changing it will create this shift where you see 2 different images on the same screen.

Steem renders lines in small parts each time some video related registers are changed (color, freq, res). This basically gives the same result as Hatari.

The solution to correctly renders this demo would be :
- to detect that a write to memory is overlapping with the line currently being emulated and have a special processing for this (having single buffer is often used, but having the same line written to/displayed is much rarer). But this is quite costly as you need to intercept all writes to memory and see if they "collide" with the current video pointer address. This also means you need an accurate way to measure at which cycles you're currently in the line (whether using absolute or relative cycles). In that case, you can consider that any write to the overlapping memory has to be considered as a write to a video related register and you need to update the screen immediatly. In the end, this means you will draw the line in small chunks of 16 pixels, which will certainly slow down emulation a lot in the case of this demo.

- another solution is to use a bus driven approach (instead of cpu driven, as done in WinUAE most accurate mode) : give 4 cycles to the CPU, 4 cycles to the MMU/shifter and draw 4 pixels (you need to share bus cycle with blitter, disk dma, sound dma, ... too). This is the most generic way to run the emulation, the closest to how the hardware really works, but it will be sloooowww, it really requires a lot of cpu.

So, even if the GEN4 demo might not be the most brainblasting one (but I really enjoyed it back in the days, especially the music which is using Rob Hubbard's replay but seems to be made ULM's own editor ?), I consider it be one of the most complex demo remaining to emulate because it involves this "display while you write in the memory" effect. If Hatari ever emulates it, I think it will be through the 2nd method (bus driven approach).

Nicolas
User avatar
npomarede
Atari Super Hero
Atari Super Hero
 
Posts: 695
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby Steven Seagal » Mon Apr 09, 2012 3:58 pm

I figure it's quite easier to emulate in Steem thanks to the clever video system: you just need to force rendering when MOVE.W ..., (d16, An) goes to the SDP zone, just like it was a shifter event.
I think the left border of the scroller wasn't "1 VBL" compliant, but it's hard to tell on youtube, the framerate is different anyway. Check it if you fix it on Hatari.

About the VBL timing, one of my glorious hacks is to reduce timing by 4 (52 instead of 56). It helps some programs and breaks nothing, maybe because on a STF, the delay before execution is 64, in Steem it's hardwired as 68. Just a thought.

Edit

- another solution is to use a bus driven approach (instead of cpu driven, as done in WinUAE most accurate mode) : give 4 cycles to the CPU, 4 cycles to the MMU/shifter and draw 4 pixels (you need to share bus cycle with blitter, disk dma, sound dma, ... too). This is the most generic way to run the emulation, the closest to how the hardware really works, but it will be sloooowww, it really requires a lot of cpu.


Surely you mean 2 cycles for the CPU, 2 for the shifter?
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Re: The worst hack in Steem

Postby Steven Seagal » Mon Apr 09, 2012 4:12 pm

npomarede wrote:In the case of Hatari, we render one full line at a time, so when the demo expects the line was displayed on real hardware to change it, in the case of Hatari the line was still not displayed, so changing it will create this shift where you see 2 different images on the same screen.


2 different images? I'm afraid you can't beat Steem SSE:

Image

Part of development woes. You may recognise "Cool STE" and "Tekila".
In the CIA we learned that ST ruled
User avatar
Steven Seagal
Atari Super Hero
Atari Super Hero
 
Posts: 928
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed

Next

Return to Development

Who is online

Users browsing this forum: Brandwatch [Bot], CommonCrawl [Bot], Yandex [Bot] and 0 guests