One last thing

All about games on the Falcon, TT & clones

Moderators: Mug UK, lp, [ProToS], moondog/.tSCc., Moderator Team

blabla
Atarian
Atarian
Posts: 9
Joined: Tue Sep 06, 2016 7:30 pm

One last thing

Postby blabla » Mon Apr 16, 2018 3:08 pm

I've spent all day and night trying to trim down and speed up the executable up to a factor of 10x.
And how does the Atari Falcon thank me ?
"UGH DUH, I can't handle it, slow down pls."

"But there's a 256 color mode !", i hear you say.
Except its planar, not chunky, making the whole thing kind of pointless since i would need to convert the buffer to planar.

I used to be gay for the Atari falcon, i really was. For once, i thought it was a better Amiga 1200.
But he never returned the love i had for him.

Maybe we were never meant to be together : after all, i would try to avoid assembly at all cost.

I will have to spend the rest of my life begging for money on the Google Play Store by making crappy games.
I will never know the truth behind it.... But none of this really matter.

Because as far as the world is concerned, i'm just a lame indie dev among billions.

Give Evil Australians, my latest game, a try on your Atari Falcon, should work on a stock Falcon.
It works on aranym & hatari but i know it will crash on real hard. Always...

Binary : https://github.com/gameblabla/evilaustralians/releases/tag/2
Source code : https://github.com/gameblabla/evilaustralians/tree/master/Consoles_port/Falcon_port
Video : https://www.youtube.com/watch?v=jy8x1Qe9oS0


Story

Code: Select all

John, living in Australia, is an avid video game collector.
However, he learns from the news that the Australian government is soon going to ban video games.
John became furious :
he grabs his weapons and swears vengeance on the government.

However 7 years ago,
the government collected his own DNA as part of a secret project.
This secret project allowed the government to steal DNAs from all the aussies.
The Government learned of John's plan and they used his DNA to build an entire clone army of himself.

John is now facing an entire army of himself,
will he make it alive and finally kill the law ?
Last edited by blabla on Tue Apr 17, 2018 3:55 am, edited 2 times in total.

User avatar
christos
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2385
Joined: Tue Apr 13, 2004 8:24 pm
Location: Greece
Contact:

Re: One last thing

Postby christos » Mon Apr 16, 2018 7:08 pm

Saw the video. Looks fun. I wish my falcon was still alive. I'll give it a try on hatari.

Thanks for the release! I am sure with a little a bit of fiddling you can get it to as good a speed as you want. It's just that the falcon is a bit strange in its graphics so you kind of need to work with it :)

BTW, the A1200 is also using bitplanes :P
Felix qui potuit rerum cognoscere causas.
My Atari blog

STOT Email address: stot(NoSPAM)atari(DOT)org

mlynn1974
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Mar 03, 2008 10:33 pm
Contact:

Re: One last thing

Postby mlynn1974 » Mon Apr 16, 2018 9:24 pm

I don't have a Falcon, but I tried it on Hatari. The music is excellent and the game scrolling is very smooth. The collision detection seems a bit iffy in that I couldn't shoot anyone. Maybe I'm just too slow.

Hatari emulating a 68030 Falcon, 4Mb RAM, no DSP and EmuTOS on my little N4200 laptop is very slow but the game is playable.
I was wondering if the MP2 player consumed too much processor time but that doesn't seem to be the case.
I take it if no DSP is available it uses DMASound? That shouldn't take much processor time.

Have you looked into using DMLs Atari Game Tools? That might give you a nice speed boost.
Still got, still working: Atari 4Mb STe, 520STFM, 2.5Mb STF.
Hardware: Cumana CSA 354, Ultimate Ripper, Blitz Turbo, Synchro Express II (US and UK Versions).

blabla
Atarian
Atarian
Posts: 9
Joined: Tue Sep 06, 2016 7:30 pm

Re: One last thing

Postby blabla » Mon Apr 16, 2018 10:30 pm

christos wrote:Thanks for the release! I am sure with a little a bit of fiddling you can get it to as good a speed as you want. It's just that the falcon is a bit strange in its graphics so you kind of need to work with it :)

The small system bus is what kills it, it's simply too slow for the True color mode and to make it worse, there is no chunky mode for 256 colors. That's the reason why a CT60 is infinitum faster.
I'm not sure i can make it run faster without external help...

BTW, the A1200 is also using bitplanes :P

Yes, and not only it is also using bitplanes, the AGA chipset is also stuck at the incredible speed of 7Mhz...
That's why i was expecting the Falcon to be faster after seeing that 68030 in there until i looked at it closer...

mlynn1974 wrote:The music is excellent and the game scrolling is very smooth

Happy to hear that you love the music and the smooth scrolling. (well at least on Hatari...)

mlynn1974 wrote:The collision detection seems a bit iffy in that I couldn't shoot anyone. Maybe I'm just too slow.

I am very surprised to hear that. I gave it a try just now on Hatari/Aranym and it works fine.
It's not perfect but i never got reports (on any of my versions) that it made things unplayable.
Maybe you're using an old Hatari version that had a CPU bug ?

mlynn1974 wrote:I was wondering if the MP2 player consumed too much processor time but that doesn't seem to be the case.

ev_nodsp.prg does not use the DSP at all, it uses raw files and STE sound.
ev_dsp.tns is a bit slower (due to its size, among other reasons) and uses the DSP for non-gameplay segments.
STE sound is used for gameplay. (Because i found out that you can't use Dosound while using the DSP for sound)
The YM chip is used for the sound effect. (yes, only one in the entire game)

In all cases, the issue is not with the MP2 player or even raw sound files because the mp2 player uses the DSP and raw sound files cost almost no CPU time.
What's causing the slowdown is the true color mode on the Falcon.
It is simply nearly impossible to constantly refresh a 320x240 screen at 60 FPS or even 50 FPS in True Color.
I even tested this with a simple code and i could sadly confirm that's the case, at least according to Hatari with cycle exact emulation enabled.
(I checked my task manager and Hatari was only using 8% out of my 4 cores)

mlynn1974 wrote:Have you looked into using DMLs Atari Game Tools? That might give you a nice speed boost.

I actually took a look at it and i don't have good things to say about it...
Several issues with it :
- It does not support any Falcon features, 16 colors only.
- Little to no sound support.
- The resulting executable is more than 120Kb for most samples !
- It is C++... And it cannot be easily converted to C. I prefer minimalism when possible.
- The way it's implemented is very inefficient (especially the dictionary) and would only add bloat to my code.

On the plus side, the blitter code (especially the scrolling) is pretty nice. I thought about taking some of the C/ASM code so i could trim it down to my needs for an STE version but that's going to be a very hard thing to do...

I appreciate the feedback. Next time i work on a project for the Falcon, it's probably going to be more static...

mlynn1974
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Mar 03, 2008 10:33 pm
Contact:

Re: One last thing

Postby mlynn1974 » Mon Apr 16, 2018 11:01 pm

Maybe you're using an old Hatari version that had a CPU bug ?

No it's just shift is the fire key and sticky keys on Windows 10 was causing the problem! The last time I used Hatari was on Windows Vista. I've turned sticky keys off.

On the plus side, the blitter code (especially the scrolling) is pretty nice

I haven't checked the source code for the game but it would be a good idea to look into using the blitter as a starting point to speeding up things.

It is simply nearly impossible to constantly refresh a 320x240 screen at 60 FPS or even 50 FPS in True Color.

Yes I tried a 320x200 pixel fader in True Color mode and had to use a lookup table to get a decent speed. I don't have enough experience of programming the Falcon to advise on that but I would be interested to see future versions of your game.

From a graphics point of view it would be nice to change the hero to be a different sprite and some faster enemies Bubble Bobble style would be interesting.
Still got, still working: Atari 4Mb STe, 520STFM, 2.5Mb STF.
Hardware: Cumana CSA 354, Ultimate Ripper, Blitz Turbo, Synchro Express II (US and UK Versions).

User avatar
Atari030
Atari Super Hero
Atari Super Hero
Posts: 600
Joined: Mon Feb 27, 2012 6:14 am
Location: Melbourne, Australia

Re: One last thing

Postby Atari030 » Mon Apr 16, 2018 11:43 pm

Evil Australians? What a novel title, what is the back story?

blabla
Atarian
Atarian
Posts: 9
Joined: Tue Sep 06, 2016 7:30 pm

Re: One last thing

Postby blabla » Tue Apr 17, 2018 3:59 am

mlynn1974 wrote:I haven't checked the source code for the game but it would be a good idea to look into using the blitter as a starting point to speeding up things.

And how the blitter could help speed things on the Falcon ? I never heard of the blitter chip being used for Falcon games...
Wouldn't that be slower ?

Btw, this is how the background is drawn.

Code: Select all

void Notrans_DrawSprite_NoChecks_scroll(BITMAP* b, u16* scr, unsigned short scroll )
{
   u16* dst = (u16*)scr;
   u16* src = (u16*)b->data;
   u16 row = b->height;
   
   while (row--)
   {
      VFastCopy32(src+scroll, dst, 640);
      dst += (RL_SCREEN_WIDTH);
      src += b->width;
   }
}

_VFastCopy32:   ; Multiple of 32
; a0 = src
; a1 = dst
; d0 = size
   movem.l   d3-d7,-(a7)
   lsr.l   #5,d0
   subq.l   #1,d0
.copy:   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1
   dbra   d0,.copy
   movem.l   (a7)+,d3-d7
   rts

I used to draw each tile on screen before but that was even slower plus i needed to clip them...
The whole bitmap for each level is loaded in memory and i just scroll through it.
Obviously this won't be a solution should i want to make multi-scrolling games in the future...

mlynn1974 wrote:From a graphics point of view it would be nice to change the hero to be a different sprite and some faster enemies Bubble Bobble style would be interesting.

I'm considering to add more levels and change a few graphics here and there. Maybe even add new weapons...
But not feeling like it right now <_<

Atari030 wrote:Evil Australians? What a novel title, what is the back story?

Yes, i had to make a game about my hatred for dem aussies.
I have updated the first post, it includes the whole backstory.

User avatar
Atari030
Atari Super Hero
Atari Super Hero
Posts: 600
Joined: Mon Feb 27, 2012 6:14 am
Location: Melbourne, Australia

Re: One last thing

Postby Atari030 » Tue Apr 17, 2018 5:23 am

I will give it a whirl on my Falcons. :-) Luckily my name isn't John, but I do know a few.

User avatar
dhedberg
Atari Super Hero
Atari Super Hero
Posts: 784
Joined: Mon Aug 30, 2010 8:36 am
Contact:

Re: One last thing

Postby dhedberg » Tue Apr 17, 2018 12:38 pm

blabla wrote:
mlynn1974 wrote:I haven't checked the source code for the game but it would be a good idea to look into using the blitter as a starting point to speeding up things.

And how the blitter could help speed things on the Falcon ? I never heard of the blitter chip being used for Falcon games...
Wouldn't that be slower ?[/code]

I don't see much use of the blitter in the TrueColor mode, but in bitplane modes it beats the CPU in many situations.

blabla wrote:Btw, this is how the background is drawn.

Code: Select all

void Notrans_DrawSprite_NoChecks_scroll(BITMAP* b, u16* scr, unsigned short scroll )
{
   u16* dst = (u16*)scr;
   u16* src = (u16*)b->data;
   u16 row = b->height;
   
   while (row--)
   {
      VFastCopy32(src+scroll, dst, 640);
      dst += (RL_SCREEN_WIDTH);
      src += b->width;
   }
}

_VFastCopy32:   ; Multiple of 32
; a0 = src
; a1 = dst
; d0 = size
   movem.l   d3-d7,-(a7)
   lsr.l   #5,d0
   subq.l   #1,d0
.copy:   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1
   dbra   d0,.copy
   movem.l   (a7)+,d3-d7
   rts


Here's some optimizations you may want to consider.
- Use add.l Dn,a1 rather than lea 32(a1),a1 to avoid the CEA cost.
- Partially unroll the loop (i-cache is 256 bytes). In this example you have the cost of 640 dbf. Create a VFastCopy128 and unroll the loop 4 times.
- Use more registers in the movem.l.
- Disable data cache (if enabled) while copying/clearing a lot of memory.

Actually, in your case it's probably faster to just use a lot of move.l (a0)+,(a1)+ instead of the movem.l to avoid the add.l/lea.

blabla wrote:I used to draw each tile on screen before but that was even slower plus i needed to clip them...
The whole bitmap for each level is loaded in memory and i just scroll through it.
Obviously this won't be a solution should i want to make multi-scrolling games in the future...

There's a hardware limit on how large your virtual screen can be. In TrueColor you can have about 3 screens a 320 pixels. So just loading a large bitmap and scrolling through it will not only require a lot of memory, it will also restrict the size of your "worlds/maps".

I totally agree that in general the TrueColor mode is a bit too much for a stock Falcon030, but for certain games it works and makes implementation easier. In the first preview of our game Willie's Adventures we used the TrueColor mode, but in the second preview we switched to 8 bitplanes to be able to keep the refresh rate at a steady 50 frames a second, lower the memory foot print, and add parallax scrolling. In bitplane mode the blitter proved useful!
Daniel, New Beat - http://newbeat.atari.org. Like demos? Have a look at our new Falcon030 demo MORE.

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 268
Joined: Mon Jan 05, 2009 5:41 pm

Re: One last thing

Postby LaurentS » Tue Apr 17, 2018 8:49 pm

> Actually, in your case it's probably faster to just use a lot of move.l (a0)+,(a1)+ instead of the movem.l to avoid the add.l/lea.
I think the movem is still faster with a add.l dx, ax
I agree you should unroll partially your loop to reduce the dbf number and increase the registers number into the movem.

Is there a final version of willy's adventures ?

User avatar
dhedberg
Atari Super Hero
Atari Super Hero
Posts: 784
Joined: Mon Aug 30, 2010 8:36 am
Contact:

Re: One last thing

Postby dhedberg » Wed Apr 18, 2018 7:42 am

LaurentS wrote:>Is there a final version of willy's adventures ?

Thomas works on it on and off and is still determined to release it (someday). I kind of lost interest in the game when most of the coding was done as I couldn't contribute with level graphics or music (my skills are not good enough in that area, hehe). Time will tell I guess.
Daniel, New Beat - http://newbeat.atari.org. Like demos? Have a look at our new Falcon030 demo MORE.

blabla
Atarian
Atarian
Posts: 9
Joined: Tue Sep 06, 2016 7:30 pm

Re: One last thing

Postby blabla » Wed Apr 18, 2018 7:07 pm

I unrolled the loop 4 times, and simplified the function (since only that function uses it) and i could only notice a small improvement in performance.

Code: Select all

void Notrans_DrawSprite_NoChecks_scroll(BITMAP* b, u16* scr, unsigned short scroll )
{
   u16* dst = (u16*)scr;
   u16* src = (u16*)b->data;
   unsigned char row = b->height;
   while (row--)
   {
      VFastCopy128(src+scroll, dst);
      dst += (RL_SCREEN_WIDTH);
      src += b->width;
   }
}

_VFastCopy128:   ; Multiple of 32
; a0 = src
; a1 = dst
; d0 = size
   movem.l   d3-d7,-(a7)
   move.l #4,d0
.copy:   
   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1
   
   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1

   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1
   
   movem.l   (a0)+,d1-d7/a2
   movem.l   d1-d7/a2,(a1)
   lea   32(a1),a1
   
   dbra   d0,.copy
   movem.l   (a7)+,d3-d7
   
   rts

When you say i should use more registers, do you mean the data registers or the adress registers ?
Probably not the data registers since all of them are being used. And how do i disable the data cache ?
I admit that i'm fairly new to this <_<
(the assembly routines aren't actually mine, i never messed with motorola assembly until now)

Btw, i also tried using lots of move.l and indeed it seems to run a hair faster.

Code: Select all

_VFastCopy128:   ; Multiple of 32
; a0 = src
; a1 = dst
; d0 = size
   move.l #19,d0
.copy:   
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   move.l   (a0)+,(a1)+
   dbra   d0,.copy
   rts

However, i wouldn't call it buttery smooth... Its only like 1 or 2 frames faster.
So... assuming that i could disable data cache, replace lea with add.l (which i couldn't make it work...) and use more registers,
could the movem version run faster than just using move.l ?

There's a hardware limit on how large your virtual screen can be. In TrueColor you can have about 3 screens a 320 pixels. So just loading a large bitmap and scrolling through it will not only require a lot of memory, it will also restrict the size of your "worlds/maps".

I'm actually not using the virtual screen :P (as some levels are larger than 3 screens)
It does require lots of memory (~400-525 kb for each level in this case) but so far i still have like 1Mb left, so i'm not worrying about memory consumption too much. (as far the Falcon is concerned anyway)
I could decrease memory consumption if i wanted to (like my first attempt with tilemapped graphics or stream from the hard drive) but that would be much slower and it's not needed here so...

Feel free to try out the new minor release with the new routine... I still don't know how well (or bad) it runs.

User avatar
dhedberg
Atari Super Hero
Atari Super Hero
Posts: 784
Joined: Mon Aug 30, 2010 8:36 am
Contact:

Re: One last thing

Postby dhedberg » Wed Apr 18, 2018 7:41 pm

blabla wrote:When you say i should use more registers, do you mean the data registers or the adress registers ?
Probably not the data registers since all of them are being used.

Doesn't matter. Any free register will do.

blabla wrote:And how do i disable the data cache ?

It's disabled by default, so unless you haven't enabled it it's off. It usually give a small performance boost, but requires some care as you need to in cases where it does no good, like while copying larger amounts of sequential data. In your copy loop it would actually decrease the performance.
Here's some macros that you can use to control the data cache:

Code: Select all

FreezeDataCache: MACRO
      movec   cacr,\1
      bset   #9,\1         ; Freeze data cache
      movec   \1,cacr
      ENDM

UnfreezeDataCache: MACRO
      movec   cacr,\1
      bclr   #9,\1         ; Unfreeze data cache
      movec   \1,cacr
      ENDM

UnfreezeAndClearDataCache: MACRO
      movec   cacr,\1
      bclr   #9,\1         ; Unfreeze data cache
      bset   #11,\1         ; Clear data cache
      movec   \1,cacr
      ENDM

DisableDataCache: MACRO
      movec   cacr,\1
      bclr   #8,\1         ; Disable data cache
      movec   \1,cacr
      ENDM

EnableDataCache: MACRO
      movec   cacr,\1
      bset   #8,\1         ; Enable data cache
      movec   \1,cacr
      ENDM

EnableAndClearDataCache: MACRO
      movec   cacr,\1
      bset   #8,\1         ; Enable data cache
      bset   #11,\1         ; Clear data cache
      movec   \1,cacr
      ENDM

blabla wrote:I admit that i'm fairly new to this <_<
(the assembly routines aren't actually mine, i never messed with motorola assembly until now).

That's OK. Don't give up! 680x0 assembly is a lot of fun and once you get the hang of it, it is actually really simple. The good part is it gives you full control over the computer and the code. The fun part is optimizing code and to find out clever ways to do things (remember to document those clever things or you may have a hard time figuring out your code later otherwise). The bad part is that some stuff is really tedious and messy to write/read in assembler (nested conditional logic, etc).

blabla wrote:Btw, i also tried using lots of move.l and indeed it seems to run a hair faster.
However, i wouldn't call it buttery smooth... Its only like 1 or 2 frames faster.
So... assuming that i could disable data cache, replace lea with add.l (which i couldn't make it work...) and use more registers,
could the movem version run faster than just using move.l ?

Yes, at least that's my experience on real hardware (I haven't actually looked up the instruction execution timings for it), but it requires you to maximize the number of registers in the instruction's register list.

blabla wrote:I'm actually not using the virtual screen :P (as some levels are larger than 3 screens)
It does require lots of memory (~400-525 kb for each level in this case) but so far i still have like 1Mb left, so i'm not worrying about memory consumption too much. (as far the Falcon is concerned anyway)
I could decrease memory consumption if i wanted to (like my first attempt with tilemapped graphics or stream from the hard drive) but that would be much slower and it's not needed here so...

I'll have to take a look at your game to see what you're doing. :-)

blabla wrote:Feel free to try out the new minor release with the new routine... I still don't know how well (or bad) it runs.

I will!
Daniel, New Beat - http://newbeat.atari.org. Like demos? Have a look at our new Falcon030 demo MORE.

mlynn1974
Captain Atari
Captain Atari
Posts: 194
Joined: Mon Mar 03, 2008 10:33 pm
Contact:

Re: One last thing

Postby mlynn1974 » Wed Apr 18, 2018 7:48 pm

I also thought about unrolling the loop.
I don't know the effects of the 68030 cache or instruction timing differences between the 68000 but I did some rough calculations.

At 16MHz we have 320,000 clock cycles per frame.

dbra cost:
640/32=20 dbra instructions per scanline iteration
20*10=200cc when branch taken

200*240 lines high=48,000cc
(48,000/320,000)*100=15%
So that's 15% frame time just doing branches!

Getting rid of the dbra frees up register d0 so use that to fill all data registers:
rept 20
movem.l (a0)+,d0-d7
movem.l d0-d7,(a1)
lea 32(a1),a1
endr
Now a movem.l takes 8+(8*n) cycles = 72 cycles per read/write
Maybe move.l (a0)+,(a1)+ is faster avoiding the lea 32(a1),a1 but the subroutine will be smaller by using movem.l

I apologize in advance for any errors. I don't have a Falcon (I wish I had) so I don't know what effect if any the 68030 256 byte instruction cache may have. Has anyone got a 68030 clock cycle sheet?
Still got, still working: Atari 4Mb STe, 520STFM, 2.5Mb STF.
Hardware: Cumana CSA 354, Ultimate Ripper, Blitz Turbo, Synchro Express II (US and UK Versions).

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 268
Joined: Mon Jan 05, 2009 5:41 pm

Re: One last thing

Postby LaurentS » Wed Apr 18, 2018 8:30 pm

> I don't know what effect if any the 68030 256 byte instruction cache may have.

That's the most important optimisation to take care of on the falcon.
In one of my previous demos, the same effect without and with the instruction cache gave me 9 and 4 VBLs.
Try to use it at the maximum for the best performances.

Laurent

User avatar
dhedberg
Atari Super Hero
Atari Super Hero
Posts: 784
Joined: Mon Aug 30, 2010 8:36 am
Contact:

Re: One last thing

Postby dhedberg » Thu Apr 19, 2018 12:49 pm

mlynn1974 wrote:I also thought about unrolling the loop.

Getting rid of the dbra frees up register d0 so use that to fill all data registers:
rept 20
movem.l (a0)+,d0-d7
movem.l d0-d7,(a1)
lea 32(a1),a1
endr

I apologize in advance for any errors. I don't have a Falcon (I wish I had) so I don't know what effect if any the 68030 256 byte instruction cache may have. Has anyone got a 68030 clock cycle sheet?

The i-cache of the 68030 is the single most important thing to take into consideration when programming the Falcon. If you get rid of the loop there'll be no i-cache hits (unless this function is called repeatedly in a loop from outside where all code executed fit the i-cache). You need a loop (dbra) to take advantage of the i-cache, and you need to ensure that the size of the loop fits the 256 bytes of the i-cache.
Daniel, New Beat - http://newbeat.atari.org. Like demos? Have a look at our new Falcon030 demo MORE.


Social Media

     

Return to “Games”

Who is online

Users browsing this forum: No registered users and 2 guests