Dune II / OpenDUNE for Falcon (and TT)

All about games on the Falcon, TT & clones

Moderators: Mug UK, lp, moondog/.tSCc., [ProToS], Moderator Team

kcr2000
Atari freak
Atari freak
Posts: 73
Joined: Sun Aug 19, 2012 8:20 am

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by kcr2000 »

Hi.
First of all great work. Thank you very much for that ;-)

Just tried all versions - only one runs fine, as far as I can tell (though obviously without any sound), on my HBS640 powered Mega ST (68000 with 36MHz + 4MB + Magnum ST 8MB + TOS 2.06). It is this "opendune-custom-g3834d303M-68000-TOS.zip" version. The intro is a bit slow, but the game itself plays very well and so far without any problems. Only quit to desktop doesn't work because TOS then freezes with false colours and you have to reset, but that's not a problem.

The other two versions throw bombs at the beginning and won't start.
CLAB Falcon MK2 CT60e 14MB/512MB 2x32GB CF FPU, Atari Falcon 030 14MB FPU, Atari TT030 10MB ST 256MB TT Crazy Dots 2 ThunderStorm 8GB IDE CF, Atari TT030 10MB ST 64MB TT Megavision 300, Atari Mega STE 4MB Crazy Dots, Atari Stacy 4 16MHz, Atari 1040 STE 4MB UltraSatan, Atari 520 STE, misc STf, STFM, 1040, 520, 260, Atari PC1, Atari PC5, Portfolio, NetUSBees etc. + misc 8-Bit Ataris
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Thu Dec 24, 2020 7:41 pm "__mulsim3", GCC internal function for 68000 long multiplication is only ~1%, but if you end up optimizing everything, optimizing calls to that should also be eventually considered (it's e.g. something EmuTOS optimized).

Getting callgraph of its callers is easy:

Code: Select all

hatari_profile.py -p -g -r opendune.sym -g --only ___mulsi3 first-campaign-16mhz.txt
=> Resulting *-0.dot file shows 85% of its calls to come from GUI_DrawText(). Maybe its multiplications could be converted to short ones?
Indeed I Guess most multiplications should be short.
But there is the issue with C that short x short is short not long :(
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

If result is assigned to long, will GCC still provide only 16-bit result, or just assign MULU/MULS result to it? Or do you mean it would be a problem for other platforms which wouldn't do that?

(I guess adding multiplication inline/macro would not be acceptable upstream.)

__mulsim3() overhead is small enough that I think it needs to be considered only after larger performance issues have been optimized.

Did you have some symbol/function that is called exactly once every game frame?

I can use that to automatically detect and profile slowest frames, and to (manually) calculate average FPS values for longer profile runs.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3329
Joined: Sun Aug 03, 2014 5:54 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by ThorstenOtto »

Eero Tamminen wrote: Sat Dec 26, 2020 12:40 pm If result is assigned to long, will GCC still provide only 16-bit result, or just assign MULU/MULS result to it?
If the inputs are explicitly short, then yes. Even if the result is assigned to a long or int, because GCC knows that muls produces a 32bit result:

Code: Select all

$ cat foo.c
long test(short x, short y)
{
        return x * y;
}
$ m68k-atari-mint-gcc -S -O2 -fomit-frame-pointer -o  - foo.c
#NO_APP
        .text
        .even
        .globl  _test
_test:
        move.w 6(%sp),%d0
        muls.w 10(%sp),%d0
        rts

But if the inputs are int (which is the more common case because most of the time expressions have to be promoted to int when not using -mshort), the __mulsi32 function will be called instead:

Code: Select all

$ cat foo.c
long test(short x, short y, short z)
{
        return x * y * z;
}
$ m68k-atari-mint-gcc -S -O2 -fomit-frame-pointer -o  - foo.c
#NO_APP
        .text
        .even
        .globl  _test
_test:
        move.w 14(%sp),%a0
        move.l %a0,-(%sp)
        move.w 10(%sp),%d0
        muls.w 14(%sp),%d0
        move.l %d0,-(%sp)
        jsr ___mulsi3
        addq.l #8,%sp
        rts
__mulsim3() overhead is small
I would not underestimate that overhead. Even if the inputs are in the 16 bit range and only a single muls instruction will be executed, it is still a function call, which prevents GCC from using the registers that are clobbered by the call (d0/d1/a0/a1) for other local variables.
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

I have made 2 optimizations :
1. use pre-converted sound files. That was easy ;)
https://github.com/miniupnp/OpenDUNE/co ... f38b4a4e83
just unzip http://nanard.free.fr/opendune/DUNE2_ST ... _SOUND.zip in the DATA directory to have the .AU files next to the .PAK files.

2. Optimize GUI_DrawText()
https://github.com/miniupnp/OpenDUNE/co ... 9b51fa0f1a
https://github.com/miniupnp/OpenDUNE/co ... 533e76c245

http://nanard.free.fr/opendune/opendune ... 00-TOS.zip

I'm trying to compile with -mshort but there are some issues, there are a lot of enums which are supposed to be 32bits values

I also had quite a few bombs

Code: Select all

Address Error at address $f61738d3, PC=$72a0a addr_e3=72a0e op_e3=2543
Address Error at address $10b5cf, PC=$358d6 addr_e3=358da op_e3=2828
it looks like there are still some unaligned memory access :(
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3329
Joined: Sun Aug 03, 2014 5:54 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by ThorstenOtto »

nanard wrote: Sat Dec 26, 2020 10:55 pm I'm trying to compile with -mshort but there are some issues, there are a lot of enums which are supposed to be 32bits values
That only works if your are using libcmini or similar. mintlib does not have an -mshort variant currently. Same applies to all other libraries you need, like SDL.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Sat Dec 26, 2020 10:55 pm I'm trying to compile with -mshort but there are some issues, there are a lot of enums which are supposed to be 32bits values
I wouldn't do that on code base that hasn't been designed for 16-bit int, you can be left with *lots* of subtle bugs with code that assumes int to be at least 32-bit.

It's much better just to change the types of specific variables used in multiplications, after you've determined that all the possible values for them fit into 16-bit, especially when there's just one function (+ anything statically inlined to it) where that needs to be done.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Sat Dec 26, 2020 10:55 pm I have made 2 optimizations :
1. use pre-converted sound files. That was easy ;)
https://github.com/miniupnp/OpenDUNE/co ... f38b4a4e83
just unzip http://nanard.free.fr/opendune/DUNE2_ST ... _SOUND.zip in the DATA directory to have the .AU files next to the .PAK files.

2. Optimize GUI_DrawText()
https://github.com/miniupnp/OpenDUNE/co ... 9b51fa0f1a
https://github.com/miniupnp/OpenDUNE/co ... 533e76c245

http://nanard.free.fr/opendune/opendune ... 00-TOS.zip
Maybe the uint16 typecast could be part of the define itself?

I re-played & re-profiled the first campaign, but it would be really good if you could provide a recording of some later campaign.

Relative __mulsim3() usage results of the optimizations are following:
- GUI_DrawText: 85% -> 0%
- GFX_DrawTile: 6.6% -> 43%
- __umodsi3: 4.5% ->30%
=> __mulsim3() calls dropped to about one fifth or sixth (can't be sure as I probably didn't play campaign exactly as on previous round)

Results of this profiling round don't anymore show __mulsim3():

Code: Select all

Used cycles:
  46.68%  15614955836   _c2p1x1_4_st
   6.86%  2293825948   _GUI_DrawSprite
   6.51%  2176628472   _GFX_DrawTile
   5.56%  1861178984   _Video_SetPalette
   4.11%  1374557128   ROM_TOS
   3.27%  1093859804   _MPU_Interrupt
   3.13%  1048445400   _GUI_Widget_Viewport_D_3
   1.93%   646953932   copy16
   1.17%   390188652   _GFX_SetPalette
   1.05%   352289824   _GUI_Widget_Viewport_D
   0.95%   316219328   _GUI_Widget_HandleEven
   0.90%   302294472   copy16_d
   0.86%   289205132   _Script_Run
   0.84%   280488744   _GFX_Screen_Copy
   0.81%   269634276   _GameLoop_Unit
   0.76%   255278088   _Map_UpdateAround
   0.71%   236296368   _Unit_Find
   0.54%   181235124   _Map_Update
   0.52%   173612604   _Timer_InterruptRun.co
   0.48%   159950172   _GUI_DrawText
   0.47%   156530316   _Map_IsValidPosition
   0.46%   155184088   _GUI_DrawFilledRectang
   0.45%   151234040   _Video_Tick
   0.41%   138106840   common2
   0.41%   136135192   _Map_Update.constprop.
   0.38%   128690256   _Unit_Sort
   0.34%   113688236   _GFX_Screen_SetDirty
   0.32%   107683264   _Animation_Tick
   0.28%    92804764   less256
   0.25%    85092520   ___udivsi3
   0.22%    73675972   _Map_MarkTileDirty
   0.21%    70799492   _memcpy
(Symbols without '_' prefix are different parts of memcpy.)

As expected, relative Gfx conversion cost is now slightly larger with mulsim3 overhead gone.

PS. Binary works as well as the earlier ones with emulated 16 MHz MegaSTE (+EmuTOS):
- intro freezes, if not interrupted
- selecting next campaign crashes when the Harkonnen areas are painted (I'm playing as Atreides)

Code: Select all

WARN : Address Error writing at address $cb3b9, PC=$8825e addr_e3=8825e op_e3=3080

Panic: Address Error
misc=3081 opcode=3080
addr=000cb3b9 sr=0300 pc=0008825e

D0-3:00000008 00000000 0000002c 00000002
D4-7:000000ff 00000004 00000000 00000001
A0-3:000cb3b9 0000005a 000cafb6 000cae06
A4-7:00028c61 000cb02a 000caf8e 000070fa
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

I would have thought that palette doesn't need changing during normal game play, but GFX_SetPalette() + Video_SetPalette() take together >6.5% of the cycles!

Their callers can be found with following ("calls" to interrupt handlers are ignored):

Code: Select all

hatari_profile -p -g -r opendune.sym --ignore-to ikbd,ikbd_mousex,ikbd_mousey --only _GUI_PaletteAnimate,_Video_SetPalette first-mission.txt
See:
opendune-set-palette.png
Is this cost expected?
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

When the intro freezes at 16Mhz, game spends about all of its time doing this (abbreviated) callchain: Warning() -> vprintf() -> flusbuf() -> fstat() -> mktime().

"trace gemdos" shows game constantly appending data to a file, which "info gemdos" shows to be error log:

Code: Select all

Open GEMDOS HDD file handles:
- 64 (0x13a52): opendune-g0cffd21dM-atari-audio-68000/error.log
- 65 (0x13a52): opendune-g0cffd21dM-atari-audio-68000/data/INTRO.PAK
Hopefully attached error log tells you something:
error.log.txt
Btw. game doesn't seem to re-use DTAs, so one can see from "info gemdos" easily a history of what the game has accessed recently:

Code: Select all

Internal Fsfirst() DTAs:
+ 0: opendune-g0cffd21dM-atari-audio-68000
  - 0: error.log
  Fsnext entry = 1.
+ 1: opendune-g0cffd21dM-atari-audio-68000
  - 0: opendune.ini
  Fsnext entry = 1.
+ 2: opendune-g0cffd21dM-atari-audio-68000
  - 0: saves
  Fsnext entry = 1.
+ 3: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: 3HOUSES.AU
...
  - 310: ZREPORT3.AU
  Fsnext entry = 311.
+ 4: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: ATRE.PAK
  Fsnext entry = 1.
...
+ 84: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: BLOWUP1.AU
  Fsnext entry = 1.
+ 85: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: BLOWUP2.AU
  Fsnext entry = 1.
+ 86: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: ENGLISH.PAK
  Fsnext entry = 1.
+ 87: opendune-g0cffd21dM-atari-audio-68000/data
  - 0: INTRO.PAK
  Fsnext entry = 1.
I would assume last one to indicate he last read file.
You do not have the required permissions to view the files attached to this post.
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

ThorstenOtto wrote: Sun Dec 27, 2020 5:14 am
nanard wrote: Sat Dec 26, 2020 10:55 pm I'm trying to compile with -mshort but there are some issues, there are a lot of enums which are supposed to be 32bits values
That only works if your are using libcmini or similar. mintlib does not have an -mshort variant currently. Same applies to all other libraries you need, like SDL.
I will look into libcmini then.
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Sun Dec 27, 2020 10:10 am
nanard wrote: Sat Dec 26, 2020 10:55 pm I'm trying to compile with -mshort but there are some issues, there are a lot of enums which are supposed to be 32bits values
I wouldn't do that on code base that hasn't been designed for 16-bit int, you can be left with *lots* of subtle bugs with code that assumes int to be at least 32-bit.

It's much better just to change the types of specific variables used in multiplications, after you've determined that all the possible values for them fit into 16-bit, especially when there's just one function (+ anything statically inlined to it) where that needs to be done.
Dune II is was designed for x86 16bits mode "real mode". I'm almost sure it runs on a 286 CPU which has no 32bits registers. 286 and 68000 are similar in the way they are 16bits CPUs with 24bits address bus, but the 68000 do have 32bits registers whereas the intel 286 has only 16bits registers and address the memory in 64KB segments.
OpenDUNE was written so most code is using explicitly 16bits variables, and I guess almost all 32bits arithmetics happens because of integer promotion
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Sun Dec 27, 2020 11:52 am I would have thought that palette doesn't need changing during normal game play, but GFX_SetPalette() + Video_SetPalette() take together >6.5% of the cycles!
There are color cycling that happens during the game in Dune II :

Code: Select all

color 223 = Windtrap cycling color
color 239 : highlighted text in structure menu : PLACE IT / REPAIR etc.
          is alterning white / red
color 255 = unit/structure selection (palette cycling) on the map
            and also in factories/construction yard menu
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Sun Dec 27, 2020 8:13 pm
Eero Tamminen wrote: Sun Dec 27, 2020 11:52 am I would have thought that palette doesn't need changing during normal game play, but GFX_SetPalette() + Video_SetPalette() take together >6.5% of the cycles!
There are color cycling that happens during the game in Dune II :

Code: Select all

color 223 = Windtrap cycling color
color 239 : highlighted text in structure menu : PLACE IT / REPAIR etc.
          is alterning white / red
color 255 = unit/structure selection (palette cycling) on the map
            and also in factories/construction yard menu
6.5% seems a bit excessive considering that Windtrap isn't there from beginning of the campaign, and for large part of the game, it's not even visible on screen.

Looking at the callgraph screenshot in my previous post, you can see that during my first campaign play, main() calls GUI_PaletteAnimate() 19510 times, and GUI_DisplayFactoryWin() additional 9124 times, for total of 29470 calls. Of those calls, GFX_SetPalette() is called 10871 i.e. approximately every 3rd animation call, or every 2nd game frame (I assume main() calls animate every frame).

In my campaign profile, most time in GFX_SetPalette() is consumed by the loop at the beginning:

Code: Select all

_GFX_SetPalette:
$00021be2 :             movem.l   d2-d7/a2-a6,-(sp)          0.00% (11106, 1067260, 0, 0)
$00021be6 :             movea.l   $30(sp),a1                 0.00% (11106, 177808, 0, 0)
$00021bea :             move.l    #$9c3de,d7                 0.00% (11106, 133328, 0, 0)
$00021bf0 :             movea.l   d7,a0                      0.00% (11106, 44536, 0, 0)
$00021bf2 :             movea.l   a1,a2                      0.00% (11106, 44424, 0, 0)
$00021bf4 :             moveq     #0,d1                      0.00% (11106, 44480, 0, 0)
$00021bf6 :             movea.l   a2,a5                      0.09% (2560608, 10256712, 0, 0)
$00021bf8 :             suba.l    a1,a5                      0.09% (2560608, 20504844, 0, 0)
$00021bfa :             move.b    (a2),d0                    0.09% (2560608, 20506608, 0, 0)
$00021bfc :             cmp.b     (a0),d0                    0.09% (2560608, 20508780, 0, 0)
$00021bfe :             bne.s     $21c34                     0.09% (2560608, 20529556, 0, 0)
$00021c00 :             move.b    1(a2),d4                   0.09% (2555583, 30696188, 0, 0)
$00021c04 :             cmp.b     1(a0),d4                   0.09% (2555583, 30700132, 0, 0)
$00021c08 :             bne.s     $21c34                     0.09% (2555583, 20471476, 0, 0)
$00021c0a :             move.b    2(a2),d0                   0.09% (2555583, 30700844, 0, 0)
$00021c0e :             cmp.b     2(a0),d0                   0.09% (2555583, 30698960, 0, 0)
$00021c12 :             bne.s     $21c34                     0.09% (2555583, 20493816, 0, 0)
$00021c14 :             addq.l    #1,d1                      0.09% (2549511, 20419856, 0, 0)
$00021c16 :             addq.l    #3,a2                      0.09% (2549511, 20417552, 0, 0)
$00021c18 :             addq.l    #3,a0                      0.09% (2549511, 20420184, 0, 0)
$00021c1a :             cmpi.l    #$100,d1                   0.09% (2549511, 35729526, 0, 0)
$00021c20 :             bne.s     $21bf6                     0.09% (2549511, 25524510, 0, 0)
$00021c22 :             move.l    #$21bc2,$30(sp)            0.00% (9, 216, 0, 0)
...
I.e. this code: https://github.com/OpenDUNE/OpenDUNE/bl ... gfx.c#L467

I think using just memcmp() would be more effective.


Most time consuming part in Video_SetPalette() is obviously the ST/STE palette color handling:
https://github.com/OpenDUNE/OpenDUNE/bl ... ari.c#L534

If those palette changes are only needed for a very small set of fixed palettes, you could consider adding specific shortcuts for them. These specific palettes could be recognized either with memcmp(), or by annotating the palette calls (with extra arg) so that the resulting ST/STE palette can be directly taken from some pre-defined array, instead of each color needing to be separately calculated / searched for.

As to the rest of that function, you could split it into Atari machine specific functions, and in Video_Init() just assign Video_SetPalette() pointer to the correct one for given machine.
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

We know that palette 256c => 16c remaping is not efficient.
First of all, I'm looking what are the palettes used in the Amiga port to see if we can do better than a fixed palette for the whole game.
Unfortunately, the amiga palettes are 32 colors, not 16.

Then of course the color cycling must be specifically handled.
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

What about replacing this loop in the generic code: https://github.com/OpenDUNE/OpenDUNE/bl ... gfx.c#L467

With memcmp()?

Also, is there some function that's called exactly once every game frame, no more, no less?
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm What about replacing this loop in the generic code: https://github.com/OpenDUNE/OpenDUNE/bl ... gfx.c#L467

With memcmp()?
well... how is it supposed to work ? memcmp() doesn't return the index where the values are different, only the value difference.
Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm Also, is there some function that's called exactly once every game frame, no more, no less?
what about

Code: Select all

Video_ShowFPS_2
?
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
mikro
Hardware Guru
Hardware Guru
Posts: 4566
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by mikro »

I think the right question here is why do you have to copy/set palette in such a complex way? It's only 256 values after all.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Sun Jan 03, 2021 11:32 pm
Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm What about replacing this loop in the generic code: https://github.com/OpenDUNE/OpenDUNE/bl ... gfx.c#L467

With memcmp()?
well... how is it supposed to work ? memcmp() doesn't return the index where the values are different, only the value difference.
The check following the first loop doesn't care about what the index is. Speed improvement from memcmp() would compensate impact of scanning the whole palette in next loop. It would speed up cases when palette hasn't changed, and slow cases where changes are in latter palette entries. But you're right, it's a bad compromise, one should optimize the slow (but relatively common) cases, not fast paths.

If one could assume palette to be in suitably aligned address (as memory returned by malloc is), comparison could be done as longs (untested):

Code: Select all

        uint32_t *long_palette = (uint32_t*)palette;
        uint32_t *long_active = (uint32_t*)g_paletteActive;
	for (from = 0; from < 256*3/4; from++) {
		if(*long_palette++ != *long_active++)
			break;
	}
	if (from >= 256*3/4) {
		Warning("Useless GFX_SetPalette() call\n");
		return;
	}
	from = 4*from/3;
Depending on the palette index, this would sometimes set an extra palette entry redundantly, due to integer division rounding "from" value down.

Third alternative would be to typedef palette entries, and compare whole entries. GCC might be able to optimize that slightly better than comparison of their individual RGB components (like the current code does).

I think that on modern CPUs, last option would be fastest after palette entry would have been changed to be 4 bytes, so that they align with 32-bit access, instead of 3.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Sun Jan 03, 2021 11:32 pm
Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm Also, is there some function that's called exactly once every game frame, no more, no less?
what about

Code: Select all

Video_ShowFPS_2
?
But that's called only when FPS showing is enabled, which one needs to enable manually, right? (need for manual game actions would make it unsuitable for profiling automation)

Video_ShowFPS_2() seems to be called only from Video_Tick(), is "Video_Tick" something which would be called exactly once per drawn frame?
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Mon Jan 04, 2021 2:56 pm
nanard wrote: Sun Jan 03, 2021 11:32 pm
Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm What about replacing this loop in the generic code: https://github.com/OpenDUNE/OpenDUNE/bl ... gfx.c#L467

With memcmp()?
well... how is it supposed to work ? memcmp() doesn't return the index where the values are different, only the value difference.
The check following the first loop doesn't care about what the index is. Speed improvement from memcmp() would compensate impact of scanning the whole palette in next loop. It would speed up cases when palette hasn't changed, and slow cases where changes are in latter palette entries. But you're right, it's a bad compromise, one should optimize the slow (but relatively common) cases, not fast paths.

If one could assume palette to be in suitably aligned address (as memory returned by malloc is), comparison could be done as longs (untested):

Code: Select all

        uint32_t *long_palette = (uint32_t*)palette;
        uint32_t *long_active = (uint32_t*)g_paletteActive;
	for (from = 0; from < 256*3/4; from++) {
		if(*long_palette++ != *long_active++)
			break;
	}
	if (from >= 256*3/4) {
		Warning("Useless GFX_SetPalette() call\n");
		return;
	}
	from = 4*from/3;
Depending on the palette index, this would sometimes set an extra palette entry redundantly, due to integer division rounding "from" value down.

Third alternative would be to typedef palette entries, and compare whole entries. GCC might be able to optimize that slightly better than comparison of their individual RGB components (like the current code does).

I think that on modern CPUs, last option would be fastest after palette entry would have been changed to be 4 bytes, so that they align with 32-bit access, instead of 3.
The real optimisation is to change the DuneII code to call SetPalette with the indexes it is changing instead of trying to send the whole palette even when a few entries changed (which is what it does)
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

Eero Tamminen wrote: Mon Jan 04, 2021 3:02 pm
nanard wrote: Sun Jan 03, 2021 11:32 pm
Eero Tamminen wrote: Wed Dec 30, 2020 9:43 pm Also, is there some function that's called exactly once every game frame, no more, no less?
what about

Code: Select all

Video_ShowFPS_2
?
But that's called only when FPS showing is enabled, which one needs to enable manually, right? (need for manual game actions would make it unsuitable for profiling automation)

Video_ShowFPS_2() seems to be called only from Video_Tick(), is "Video_Tick" something which would be called exactly once per drawn frame?
yes of course, you can use Video_Tick.
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
nanard
Captain Atari
Captain Atari
Posts: 179
Joined: Mon Apr 04, 2016 2:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by nanard »

nanard wrote: Mon Jan 04, 2021 5:51 pm
Eero Tamminen wrote: Mon Jan 04, 2021 2:56 pm
nanard wrote: Sun Jan 03, 2021 11:32 pm
well... how is it supposed to work ? memcmp() doesn't return the index where the values are different, only the value difference.
The check following the first loop doesn't care about what the index is. Speed improvement from memcmp() would compensate impact of scanning the whole palette in next loop. It would speed up cases when palette hasn't changed, and slow cases where changes are in latter palette entries. But you're right, it's a bad compromise, one should optimize the slow (but relatively common) cases, not fast paths.

If one could assume palette to be in suitably aligned address (as memory returned by malloc is), comparison could be done as longs (untested):

Code: Select all

        uint32_t *long_palette = (uint32_t*)palette;
        uint32_t *long_active = (uint32_t*)g_paletteActive;
	for (from = 0; from < 256*3/4; from++) {
		if(*long_palette++ != *long_active++)
			break;
	}
	if (from >= 256*3/4) {
		Warning("Useless GFX_SetPalette() call\n");
		return;
	}
	from = 4*from/3;
Depending on the palette index, this would sometimes set an extra palette entry redundantly, due to integer division rounding "from" value down.

Third alternative would be to typedef palette entries, and compare whole entries. GCC might be able to optimize that slightly better than comparison of their individual RGB components (like the current code does).

I think that on modern CPUs, last option would be fastest after palette entry would have been changed to be 4 bytes, so that they align with 32-bit access, instead of 3.
The real optimisation is to change the DuneII code to call SetPalette with the indexes it is changing instead of trying to send the whole palette even when a few entries changed (which is what it does)
If you want to optimize specifically for ST/STE, you can remove all the color cycling code entireley anyway as it doesnt work anyway because of the palette 256c => 16c remapping.
4MB STE + CosmosEx /|\ MegaST4 + MegaFile 44 /|\ Stacy 4
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Mon Jan 04, 2021 5:51 pm The real optimisation is to change the DuneII code to call SetPalette with the indexes it is changing instead of trying to send the whole palette even when a few entries changed (which is what it does)
What about something like in the attached (untested!) patch, where current Atari SetPalette() function is optimized slightly, and build define controls whether it's called separately for each changed color (68000 version with ST pal support), or for the whole change range (version with TT / Falcon pal support)?
set-each-pal-entry-separately.patch.txt
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3899
Joined: Sun Jul 31, 2011 1:11 pm

Re: Dune II / OpenDUNE for Falcon (and TT)

Post by Eero Tamminen »

nanard wrote: Mon Jan 04, 2021 5:52 pm yes of course, you can use Video_Tick.
Hm. That's called from a timer callback, and doesn't have at end any function call that would be called exactly once per frame. I would actually want symbols marking both rendering start and end, so that I could separately profile worst frames for rendering time, and game engine "thinking" time. Any ideas for that?
Post Reply

Return to “Games”