ScummVM/Falcon060 pre-release

Latest news in the Atari world

Moderators: Mug UK, Silver Surfer, Moderator Team

mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Eero Tamminen wrote: Fri Mar 24, 2023 9:12 pm
mikro wrote: Fri Mar 24, 2023 7:50 am Btw, that getSciVersion() is another assert-hungry candidate:

Code: Select all

SciVersion getSciVersion() {
	assert(s_sciVersion != SCI_VERSION_NONE);
	return s_sciVersion;
}
I'm pretty sure this will be gone when I provide you with an assert-less version.
Nope, these costs are without asserts, the issue is how often these are called (maybe also branching messing i-cache):
I'm a bit at loss, before you posted this profile dump:

Code: Select all

Used cycles:
  15.87%   3.43%  17.98%  729526220615774375818263182770 * Sci::run_vm(Sci::EngineState*)
   8.97%   8.99%  12.35%  412043060741295546035675595668   Sci::reg_t::getOffset() const
   6.33%   6.34%   6.34%  290856133529149789972914978997   Sci::getSciVersion()
To me it looks like those assert checks are taking that CPU time (6.33% attributes to version, rest to offset) and removing them would significantly decreased the impact on CPU?
Additional problem I noticed, is that interacting in game options dialog constantly freezes, when going over things that have tooltips.
Yeah, that's a problem of the GUI code, with each tooltip a complete clear & copy is happening, it's equally bad in the 8bpp version of the overlay.

The analysis with run_vm() is more than enough. If it is the vm which is taking so long, I don't care that much.
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Eero: here is a snapshot of the latest scummvm master tree: https://mikro.naprvyraz.sk/private/scum ... 230326.zip (incl. dlmalloc, excl. asserts). Delete any old scummvm.ini files as well as the old data folder, just to be sure.

If you like, you can test:
- how well the default (triple buffer) rendering performs (it should be a bit slower than single buffer but waaay better previous -fullscreen- implementation) ... especially when comparing in The Dig: viewtopic.php?p=443822#p443822; you've also mentioned some performance regression with 2.6.1 vs 2.6.2: "VGA (DOS) full SOMI version has very long pauses in cursor interaction, same as EGA version. Neither would be playable on TT" (btw, EGA version is rendered in 640x400 instead of 320x200 so there's your slowdown mentioned earlier...)
- whether our gfxbit and sciversion friends are still there
- whether you can still observe some cursor/redraw errors
- how well the 8bpp overlay performs (I've committed a few optimisations into master when it comes to image loading)
- whether you see some speedup/slowdown in general

It's possible that there's one hard-to-catch race condition when using double/triple buffer. It's visible like one buffer has "forgotten" to update, i.e. if there is something zooming, you see the leftovers from previous frame. It's incredibly rare and if you detect it, please tell me (I'll try to think about it even more, I suspect it has something to do with combination of quick (a few pixels) screen update + delayed setting of screen address in vbl but I can't be sure, it can be also some aranym quirk - I haven't seen it on real hardware yet).

P.S. It's a work in progress, so don't promote / upload it anywhere.

P.P.S. Forgot to attach readme.txt: https://github.com/scummvm/scummvm/blob ... readme.txt

P.P.P.S. Fixed two small bugs (vsync must be always off in the overlay + rendering from 640x400)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Sat Mar 25, 2023 4:49 pm To me it looks like those assert checks are taking that CPU time (6.33% attributes to version, rest to offset) and removing them would significantly decreased the impact on CPU?
There's nothing saying that any of the costs would go to assert checks. Before making assumptions, and optimizations based on them, one should always verify them from the disassembly for given function (in the profile data file), especially if it's supposed to be small (= easy to check).

As you could see from the disassembly I posted, getSciVersion() 6.33% cost is just from MOVE.L + RTS, there was no assert check (comparison instruction). Either compiler was able to optimize that assert away (as always false), or asserts were not enabled for it.

Cost can be high when function is called often enough. Trivial subroutines that are called more often than any other functions can be quite costly compared to what they do, because they disrupt CPU instruction prefetching.

EDIT: looking at your ScummVM sources, compiler cannot optimize assert for uninitialized version away, as version value is dynamically assigned in another method. I.e. you had already provided me with a binary having asserts disabled (at least for resource.cpp file). :-)
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Hmm, it's possible I did it in the dlmalloc version, indeed. In that case, if it's still in the above-posted master build, I'll provide patch to inline at least those two (but if you spot some obvious candidates, please do tell, I'll include them there as well).
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

(This is still with the first dlmalloc binary.)
mikro wrote: Sat Mar 25, 2023 4:43 pm
"Gob::DataIO::unpackChunk()" does half of that amount "Common:SafeSeekableSubReadStream::read()" calls, ending in read(), which does two isatty() calls each time it's called.
Maybe the game is doing seeking & depacking at the same time? That would explain that horrible pauses. On real hardware, the intro scene shows up pretty quickly. Then the credits, also quick. Then ESC and ... the loading is like a minute!
Could you measure the time e.g. with a stopwatch? That way we'll proper comparison how the time from title screen to game start compares between real machine and emulated 32Mhz Falcon (GEMDOS HD: ~20s, IDE image: 6-7min, floppy image: hours...).
mikro wrote: Sat Mar 25, 2023 4:43 pm I see that disk barely touched (i.e. everything has been cached already by the hd driver) so it's code only. And that's 66 MHz 68060 CPU. Then finally the loading icon and again, an incredibly long pause, it nearly looks like crashed.
NOTE: because I'm doing this under Hatari GEMDOS HD emulation, actual disk reads are instant from (profiled) emulation point of view. But if needed, I could put Gobliins demo data files to a floppy image and profile things with disk reads done through cycle-accurate FDC emulation...
I'm pretty sure a simple HD image would suffice, see above.
Floppy (image)

First I used floppy image + cycle-accurate FDC emulation as I was interested about the worst case. That adds both OS side disk handling, and real HW access latency. After pressing ESC in title screen and waiting the "load" gfx to appear for over 2 hours, I decided to give up and concluded that ScummVm is completely unusable when game data files are on floppy disk.

(I'm not providing profile data for that because 2h is long enough that profiling counters wrapped around.)

Next I tried what you suggested and that completed in more reasonable time...

Hard disk (image)

90% of of the cycles spend during the 5 minutes period that "Load" / "Loading" image shows on screen, is on the OS side, and 90% of those cycles goes to handling Fseek() calls, 6-7% to handling Fread() calls:
gob1-loading.png
=> Is it possible to increase MiNTlib FILE* operation buffers so that less of the seek()s would need to hit OS side?

A bit over half of the 1 min 15s of black screen before "Loading" screen also goes to seeking, and a bit less than half goes to reading the data. See the callgraph PDF:
gob1-before-load.pdf
Cost of depacking the data is insignificant, less than 1%.

Note: unlike other profiles, above ones do not include OS side symbols, because you use HD driver + MiNT + some MiNT file system on top of that, whereas I'm using just EmuTOS, so they are not comparable.
mikro wrote: Sat Mar 25, 2023 4:43 pm There are basically two separate things in Gobliiins to investigate:
- why the loading between credits - load icon - game play is so slow
- why even doing nothing but moving cursor is so INCREDIBLY SLOW even on 060
Ok, I profiled just moving cursor. This gave:

Code: Select all

Time spent in profile = 52.82431s.
...
Used cycles:
  56.21%  56.35%  60.17%   952738487 9550371321019830635   Gob::Surface::blit(Gob::Surface const&, short, short, short, short, short, short, int)
   7.71%   7.74%  62.90%   130744933 1311492591066091456   Gob::Scenery::updateStatic(short, unsigned char, unsigned char) [clone .part.0]
   3.93%                    66547224                       c2p1x1_8_start
   3.57%                    60431357                       c2p1x1_8_rect_start
   3.11%                    52711629                       c2p1x1_8_pix16
   2.74%                    46392910                       c2p1x1_8_rect_pix16
   2.65%                    44967562                       copy256_d
   2.64%                    44672118                       copy256
   1.33%                    22486919                       rate.o
   1.25%                    21146948                       _termuser
...
Instruction cache misses:
  34.93%  35.07%  44.60%     8311839   8344632  10610298   Gob::Scenery::updateStatic(short, unsigned char, unsigned char) [clone .part.0]
   9.23%                     2195622                       _termuser
...
Visits/calls:
   9.15%              648000             c2p1x1_8_start
   9.14%              647808             c2p1x1_8_pix16
   7.92%              561036             c2p1x1_8_rect_start
   7.91%              560212             c2p1x1_8_rect_pix16
   5.22%              369971             copy16
I.e. it's doing scenery updates, and 60% of CPU cycles go to above indicated Gob::Surface::blit() method:
gob1-cursor-move.png
(Cost from the other two Gob::Surface::blit() methods is completely insignificant.)

If somebody wants to improve it, I suggest saving profile of it and comparing that blit() function annotated disassembly to its C-code...
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Sat Mar 25, 2023 9:17 pm you've mentioned some performance regression with 2.6.1 vs 2.6.2: "VGA (DOS) full SOMI version has very long pauses in cursor interaction, same as EGA version. Neither would be playable on TT" (btw, EGA version is rendered in 640x400 instead of 320x200 so there's your slowdown mentioned earlier...)
SOMI perf issues were due to OPL2 emulation. If one disables music, both versions work fine.

mikro wrote: Sat Mar 25, 2023 9:17 pm Eero: here is a snapshot of the latest scummvm master tree: https://mikro.naprvyraz.sk/private/scum ... 230326.zip (incl. dlmalloc, excl. asserts). Delete any old scummvm.ini files as well as the old data folder, just to be sure.
Using the default/builtin theme (instead of modern theme I used with previous version):
  • Checkbox 'x' marks are visible (in medium scale)
  • Mouse pointer looks now fine after render mode / vsync changes
  • Pause with Space does not change mode any more (which was annoying in Hatari)
  • TT not supported yet... :-)
mikro wrote: Sat Mar 25, 2023 9:17 pm If you like, you can test:
- how well the default (triple buffer) rendering performs (it should be a bit slower than single buffer but waaay better previous -fullscreen- implementation) ... especially when comparing in The Dig: viewtopic.php?p=443822#p443822;
Tested Dig intro.


Single buffer

Seems 5s faster:

Code: Select all

Time spent in profile = 226.32141s.
...
Used cycles:
  25.97%  26.03%  27.87%  188556509418898986322023434498   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&)
  13.47%  13.50%  13.50%   978317238 980558328 980558328   Scumm::SmushDeltaBlocksDecoder::proc4WithoutFDFE(unsigned char*, unsigned char const*, int, int, int, int, short*)
  11.76%                   853635497                       rate.o
   3.89%                   282204870                       _termuser
   3.16%                   229496446                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   2.24%                   162896369                       copy256
   2.23%   2.24%   2.24%   162017326 162425816 162425816   Scumm::ScummEngine::remapPaletteColor(int, int, int, int)
   2.07%   2.07%   2.07%   150270120 150629380 150629380   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   2.01%                   146216958                       Scumm::CharsetRendererV7::drawCharV7(unsigned char*, Common::Rect&, int, int, int, short, Scumm::TextStyleFlags, unsigned char)
   2.00%                   145342591                       Audio::makePacketizedRawStream(int, unsigned char)
   1.69%                   122474696                       _enter
   1.60%   1.60%   1.60%   116269601 116530737 116530737   Scumm::ScummEngine::testGfxUsageBit(int, int)
   1.51%                   109591071                       c2p1x1_8_rect_start
   1.14%   1.14%   5.37%    82477187  82664748 390084262   Scumm::ScummEngine::resetActorBgs()
   1.13%                    81986011                       c2p1x1_8_rect_pix16
   1.06%                    77212300                       Common::unlockMemoryPoolMutex()
   0.85%                    62077038                       common2
   0.84%   0.84%   2.59%    60753483  60896780 188419035   Scumm::ScummEngine::getResourceAddress(Scumm::ResType, unsigned short)
   0.81%                    58871920                       c2p1x1_8_start
   0.76%   7.53%   7.53%    55141860 546901646 546970917   _get_sysvar
   0.75%   0.75%   2.88%    54682423  54798155 208836352   Scumm::Gdi::resetBackground(int, int, int)
   0.71%   0.72%   0.73%    51826269  51939145  52987450   Scumm::ResourceManager::createResource(Scumm::ResType, unsigned short, unsigned int)
   0.67%                    48962091                       CARTRIDGE
   0.65%                    47113728                       copy16
   0.64%                    46748815                       c2p1x1_8_pix16
...
Visits/calls:
   5.26%             1993120             __etext
   5.26%             1993120             CARTRIDGE
   4.94%             1871569             _enter
   4.60%             1745008             common2
   4.60%             1744999             exit
   4.60%             1744962             common
   4.60%   4.60%     1744921   1742992   _memmove.2
   4.59%             1739736             exit_d2
   4.31%             1635992             less4
   4.31%             1635976             less2
   4.31%             1635961             less256
   4.31%             1635957             both_even
   3.84%   3.84%     1457650   1457650   Scumm::ScummEngine::testGfxUsageBit(int, int)
   3.23%             1224040             _termuser
   2.55%              966683             c2p1x1_8_rect_start
   2.54%              962272             c2p1x1_8_rect_pix16
   2.46%              931711             copy16
   1.61%              611981             .return.2
   1.61%              611973             .get_usermode
   1.61%   3.23%      611966   1224000   __clock
   1.61%   1.61%      611959    612072   _get_sysvar
   1.51%              573920             c2p1x1_8_start
   1.51%              573748             c2p1x1_8_pix16
   1.08%              408346             copy256
   0.71%              268584             backends/platform/atari/osystem_atari.o

Triple buffer

I thought that I had enabled Vsync for it, but on restarting ScummVM, it was off. 2 secs faster than old version (due to Vsync off?):

Code: Select all

Time spent in profile = 240.78616s.
...
Used cycles:
  26.28%  26.34%  28.07%  203006576920347188892168511313   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&)
  12.68%  12.71%  12.71%   979471715 981717770 981717770   Scumm::SmushDeltaBlocksDecoder::proc4WithoutFDFE(unsigned char*, unsigned char const*, int, int, int, int, short*)
  11.24%                   868569322                       rate.o
   3.59%                   277614339                       _termuser
   2.92%                   225781367                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   2.88%                   222496308                       c2p1x1_8_rect_start
   2.15%                   166056548                       c2p1x1_8_rect_pix16
   2.09%   2.10%   2.10%   161785721 162161920 162161920   Scumm::ScummEngine::remapPaletteColor(int, int, int, int)
   1.94%   1.95%   1.95%   150090311 150415346 150415346   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   1.90%                   146840551                       copy256
   1.88%                   145185709                       Audio::makePacketizedRawStream(int, unsigned char)
   1.58%                   122381034                       Scumm::CharsetRendererV7::drawCharV7(unsigned char*, Common::Rect&, int, int, int, short, Scumm::TextStyleFlags, unsigned char)
   1.56%                   120609433                       _enter
   1.52%   1.53%   1.53%   117770377 118059269 118059269   Scumm::ScummEngine::testGfxUsageBit(int, int)
   1.50%                   116155834                       c2p1x1_8_start
   1.19%                    92141673                       c2p1x1_8_pix16
   1.08%   1.08%   5.12%    83177284  83294751 395503308   Scumm::ScummEngine::resetActorBgs()
   1.02%                    79185365                       Common::unlockMemoryPoolMutex()
   0.82%                    63123388                       common2
   0.76%   0.77%   2.45%    59034768  59168761 189586712   Scumm::ScummEngine::getResourceAddress(Scumm::ResType, unsigned short)
   0.72%   0.72%   2.76%    55846684  55967168 212925202   Scumm::Gdi::resetBackground(int, int, int)
   0.70%   6.96%   6.96%    54211535 537735020 537827590   _get_sysvar
   0.67%   0.67%   0.69%    51834248  51944079  53427761   Scumm::ResourceManager::createResource(Scumm::ResType, unsigned short, unsigned int)
   0.62%                    48240064                       copy16
   0.62%                    48094961                       CARTRIDGE
...
Visits/calls:
   4.74%             1962635             __etext
   4.74%             1962635             CARTRIDGE
   4.72%             1957373             c2p1x1_8_rect_start
   4.70%             1948406             c2p1x1_8_rect_pix16
   4.44%             1840907             _enter
   4.29%             1777151             exit
   4.29%             1777146             common2
   4.29%             1777104             common
   4.29%   4.28%     1777064   1775114   _memmove.2
   4.27%             1771827             exit_d2
   4.01%             1663557             less4
   4.01%             1663546             less2
   4.01%             1663527             less256
   4.01%             1663511             both_even
   3.56%   3.56%     1475126   1475126   Scumm::ScummEngine::testGfxUsageBit(int, int)
   2.90%             1203586             _termuser
   2.73%             1131840             c2p1x1_8_start
   2.73%             1131485             c2p1x1_8_pix16
   2.29%              950420             copy16
   1.45%              601765             .return.2
   1.45%   1.45%      601751    601874   _get_sysvar
   1.45%              601740             .get_usermode
   1.45%   2.90%      601736   1203563   __clock
   0.89%              368232             copy256
   0.68%   0.68%      283522    283522   __tolower
   0.66%              274613             backends/platform/atari/osystem_atari.o
mikro wrote: Sat Mar 25, 2023 9:17 pm - whether our gfxbit and sciversion friends are still there
Xmas card 1988 (SCI engine)

Code: Select all

Time spent in profile = 391.78698s.
...
Used cycles:
  13.30%   5.49%  22.53%  1671996553 6900802372832168959 * Sci::run_vm(Sci::EngineState*)
  12.83%  12.86%  16.86%  161261027916161878262119434820   Sci::GfxView::draw(Common::Rect const&, Common::Rect const&, Common::Rect const&, short, short, unsigned char, unsigned short, bool, unsigned short)
   7.22%   7.23%   9.37%   907312384 9092673051178314065   Sci::reg_t::getOffset() const
   4.66%   4.67%   5.10%   586002804 587231140 641515026   Sci::readPMachineInstruction(unsigned char const*, unsigned char&, short*)
   4.57%                   574216734                       AtariGraphicsManager::updateScreen()
   4.30%   4.31%  13.77%   540372666 5415941591731054556   Sci::SegManager::getObject(Sci::reg_t) const
   3.91%   3.92%   3.92%   491489613 492522501 492522501   Sci::getSciVersion()
   3.90%   3.91%   3.91%   490660791 491720322 491720322   Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
   3.51%                   440694950                       ROM_TOS
   3.07%   3.08%   3.08%   386293475 387164946 387164946   Sci::GfxScreen::vectorIsFillMatch(short, short, unsigned char, unsigned char, unsigned char, unsigned char, bool)
   2.36%   2.37%   5.45%   296889758 297722360 685096184   Sci::GfxPicture::vectorFloodFill(short, short, unsigned char, unsigned char, unsigned char)
   1.54%   1.55%  13.98%   194155030 1946218111757362106   Sci::lookupSelector(Sci::SegManager*, Sci::reg_t, int, Sci::ObjVarRef*, Sci::reg_t*)
   1.49%   1.49%   1.84%   187545097 187925927 231878295   Sci::reg_t::setOffset(unsigned int)
   1.47%   1.47%   8.00%   184353804 1847650371005595653   Sci::Object::locateVarSelector(Sci::SegManager*, int) const
   1.43%   1.44%   8.12%   180238796 1806315961021147672   Sci::send_selector(Sci::EngineState*, Sci::reg_t, Sci::reg_t, Sci::reg_t*, int, Sci::reg_t*)
   1.43%   1.43%   1.67%   179205979 179608190 210443025   Sci::SegManager::getSegmentObj(unsigned short) const
   1.34%   1.34%   1.34%   168019100 168373109 168373109   Sci::SciEngine::checkAddressBreakpoint(Sci::reg_t const&)
   1.25%   2.39%   2.39%   157650681 300351334 300360451   Sci::Script::getObject(unsigned int)
   1.17%   1.17%   1.46%   146556137 146851770 183686871   Sci::reg_t::getSegment() const
   1.13%                   142031666                       Sci::SegManager::saveLoadWithSerializer(Common::Serializer&)
   1.07%   1.07%   6.47%   133915012 134234827 813462431   Sci::Object::getClass(Sci::SegManager*) const
   0.88%   0.88%   1.77%   110145925 110377995 222017879   Sci::Script::offsetIsObject(unsigned int) const
   0.83%                   103856106                       c2p1x1_8_rect_start
   0.75%   0.12%   0.12%    94139897  15379479  15379479 * ___free
   0.73%   0.73%   0.93%    91641073  91841805 116923871   Sci::reg_t::setSegment(unsigned short)
   0.72%   0.72%   1.22%    90211845  90399953 153689052   Sci::READ_SCI11ENDIAN_UINT16(void const*)
   0.70%   0.70%   0.70%    88313309  88509729  88520685   Sci::GfxPaint16::fillRect(Common::Rect const&, short, unsigned char, unsigned char, unsigned char)
   0.65%                    82214447                       scopy_d
   0.61%                    76871591                       c2p1x1_8_rect_pix16
   0.60%                    75531280                       scopy
   0.59%   0.59%   0.59%    74206225  74349735  74349735   ___malloc
...
Visits/calls:
  27.49%  27.49%    32309668  32309668   Sci::getSciVersion()
  13.42%  26.83%    15767961  31533491   Sci::reg_t::getOffset() const
   3.18%   6.36%     3737405   7474397   Sci::reg_t::setOffset(unsigned int)
   3.12%   3.12%     3664039   3664039   Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
   3.03%   6.07%     3565808   7131259   Sci::reg_t::getSegment() const
=> As you can see from Dig and above data, both are still there.

mikro wrote: Sat Mar 25, 2023 9:17 pm - how well the 8bpp overlay performs (I've committed a few optimisations into master when it comes to image loading)
They seem definitely faster, but I do not know how much of that is due to using builtin theme. Invoking overlay dialog in Full Throttle requires now Ctrl-F5. And as they are 8-bpp, colors change for game gfx on bg while overlay is open, which was initially slightly distracting. I much prefer things remaining in same video mode though!

mikro wrote: Sat Mar 25, 2023 9:17 pm - whether you see some speedup/slowdown in general
Based in Dig intro length, things are now slightly faster. But in Gabriel Knight screen transitions may be slower (feeling, not numbers).

mikro wrote: Sat Mar 25, 2023 9:17 pm - whether you can still observe some cursor/redraw errors
Changing cursor with right click in Gabriel Knight does not leave garbage on screen any more (mostly, see below).

mikro wrote: Sat Mar 25, 2023 9:17 pm It's possible that there's one hard-to-catch race condition when using double/triple buffer. It's visible like one buffer has "forgotten" to update, i.e. if there is something zooming, you see the leftovers from previous frame. It's incredibly rare and if you detect it, please tell me (I'll try to think about it even more, I suspect it has something to do with combination of quick (a few pixels) screen update + delayed setting of screen address in vbl but I can't be sure, it can be also some aranym quirk - I haven't seen it on real hardware yet).
There's some garbage left on screen from the cursor, when it's used partly outside of the screen update area:
gabriel-garbage.png
This happens in single buffer mode though...
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

Tested the new "low latency audio" option in Full Throttle, which is supposed to affect iMuse functionality.

IMuse methods with low latency disabled:

Code: Select all

Time spent in profile = 141.05131s.

Used cycles:
   5.07%                   229257533                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   2.67%   2.68%   2.68%   120971253 121253391 121253391   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   0.32%   5.40%   5.40%    14404186 244202979 244202979   Scumm::IMuseDigiInternalMixer::mix(unsigned char*, int, int, int, int, int, int, int, bool)
   0.14%   0.14%   4.20%     6171777   6188007 190188942   Scumm::SmushPlayer::sendAudioToDiMUSE(unsigned char*, int, int, int, int, int)
   0.10%   0.16%  10.61%     4337520   7418795 480023993   Scumm::IMuseDigital::tracksCallback()
   0.08%   0.10%   1.29%     3707449   4383300  58350510   Scumm::IMuseDigital::waveOutWrite(unsigned char**, int&, int&)
   0.07%   0.08%   4.07%     3382814   3411757 184007836   Scumm::IMuseDigital::receiveAudioFromSMUSH(unsigned char*, int, int, int, int, int, bool)
   0.07%   0.07%  10.79%     2999726   3004214 488517300   Scumm::IMuseDigital::diMUSEHeartbeat()

Visits/calls:
   0.14%               32853             Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   0.14%   0.14%       32853     32853   Scumm::IMuseDigiInternalMixer::mix(unsigned char*, int, int, int, int, int, int, int, bool)
   0.13%   0.25%       29928     59853   Scumm::IMuseDigital::receiveAudioFromSMUSH(unsigned char*, int, int, int, int, int, bool)
   0.13%   0.38%       29927     89775   Scumm::SmushPlayer::sendAudioToDiMUSE(unsigned char*, int, int, int, int, int)
And with low latency enabled:

Code: Select all

Time spent in profile = 156.37515s.

Used cycles:
   4.54%                   227909519                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   3.58%   3.58%   3.58%   179533954 179869111 179869111   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   0.14%   4.70%   4.70%     7138160 235592783 235592783   Scumm::IMuseDigiInternalMixer::mix(unsigned char*, int, int, int, int, int, int, int, bool)
   0.11%   0.18%  10.41%     5588763   8864381 522224767   Scumm::IMuseDigital::tracksLowLatencyCallback()
   0.06%   0.06%  10.60%     3174071   3179429 531641735   Scumm::IMuseDigital::diMUSEHeartbeat()
   0.06%   0.06%   3.48%     3069813   3074932 174784006   Scumm::SmushPlayer::sendAudioToDiMUSE(unsigned char*, int, int, int, int, int)

Visits/calls:
   0.16%   0.16%       33575     33815   Scumm::IMuseDigiInternalMixer::getStream(int)
   0.08%   0.08%       16511     16511   Scumm::IMuseDigiInternalMixer::mix(unsigned char*, int, int, int, int, int, int, int, bool)
   0.08%               16508             Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   0.07%   0.14%       14975     29950   Scumm::IMuseDigital::receiveAudioFromSMUSH(unsigned char*, int, int, int, int, int, bool)
   0.07%   0.21%       14974     44922   Scumm::SmushPlayer::sendAudioToDiMUSE(unsigned char*, int, int, int, int, int)
=> Does not seem to help.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

Eero Tamminen wrote: Sun Mar 26, 2023 7:41 pm 90% of the cycles spend during the 5 minutes period that Gobliins "Load" / "Loading" image shows on screen, is on the OS side, and 90% of those cycles goes to handling Fseek() calls, 6-7% to handling Fread() calls:
...
=> Is it possible to increase MiNTlib FILE* operation buffers so that less of the seek()s would need to hit OS side?
Largest Gobliins 1 demo file is 300KB, so you could start by trying 256KB buffer size, and measure result either by wall-clock (on real HW), or with counting tracing breakpoint on OS Fseek call (on emulator). And if that helps, halve buffer size until things start slow down again.

If it does not help, somebody needs to look into MiNTlib FILE buffering and debug why fseek() implementation calls OS although data should be already buffered in...
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Eero Tamminen wrote: Sun Mar 26, 2023 11:26 pmIf it does not help, somebody needs to look into MiNTlib FILE buffering and debug why fseek() implementation calls OS although data should be already buffered in...
I've taken a note about it. Most likely the best approach would be to study the code and just depack/preload it in one go. Not on top of my list but still interesting to do.
Eero Tamminen wrote: Sun Mar 26, 2023 10:59 pm Tested the new "low latency audio" option in Full Throttle, which is supposed to affect iMuse functionality.
[..]
=> Does not seem to help.
Interesting. Release notes mention also a couple of other games, maybe worth testing: Added a low latency audio mode to Full Throttle, The Dig and The Curse of Monkey Island; this can improve audio performance expecially in non-desktop devices, but it is also a little less accurate than the original. To be fair, it says "it *can* improve". :)

Gabriel Knight - will take a look. The fact it happens in single buffer is even more interesting, double/triple buffering can cheat and clear the rectangle from other sources.

Built-in theme: feel free to revert back to the original one (Options -> GUI -> Theme selection). This is just default, for faster initial startup.

Ok, I'll move Sci::reg_t::getOffset() (and friends), Sci::getSciVersion() (this will be fun as it is a C static variable) and Scumm::ScummEngine::testGfxUsageBit() into header files. But isn't their CPU utilisation a bit lower now?

I guess I'll do the inlining above and provide a new snapshot. Apart from that it'll take a while for something new because that "TT rework" (arbitrary resolution support in general) is not a small job (and as a bonus, I will have to think about a way how to optimally render into 320x480...). Most likely the 4bpp & 6bpp C2Ps will be included along the way, if easy enough; I did test it and yes, CGA and EGA versions can be used with 4 bits per pixel but since they are rendered into 640x400 they cancel the speedup: 640x400 = 4 times more data, 4bpp = half data to render, so all in all... twice as much data to render. And the 6bpp C2P suitable for Amiga games (64 colours) is also questionable, I briefly check the pixel values and they do get bigger than 63. So it's possible that Amiga versions were using hardware tricks to increase their palette size.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Mon Mar 27, 2023 6:27 am
Eero Tamminen wrote: Sun Mar 26, 2023 11:26 pmIf it does not help, somebody needs to look into MiNTlib FILE buffering and debug why fseek() implementation calls OS although data should be already buffered in...
I've taken a note about it. Most likely the best approach would be to study the code and just depack/preload it in one go. Not on top of my list but still interesting to do.
It's possible that other games / game engines have similar issues. Using larger file buffer could help all of them. ScummVM already requires a lot of RAM, so a bit of extra should not be a problem. On the other hand, fix to the game engine would benefit all ScummVM users, not just the Atari niche. So... Maybe do (eventually) both? :-)
mikro wrote: Mon Mar 27, 2023 6:27 am
Eero Tamminen wrote: Sun Mar 26, 2023 10:59 pm Tested the new "low latency audio" option in Full Throttle, which is supposed to affect iMuse functionality.
[..]
=> Does not seem to help.
Interesting. Release notes mention also a couple of other games, maybe worth testing: Added a low latency audio mode to Full Throttle, The Dig and The Curse of Monkey Island; this can improve audio performance expecially in non-desktop devices, but it is also a little less accurate than the original. To be fair, it says "it *can* improve". :)
I'm using 2kB buffer with 12Khz audio to make speech bearable on 32Mhz 030. Maybe low-latency perf improvement and audio quality drop would be visible only with higher frequency and (time-wise) smaller buffer?

mikro wrote: Mon Mar 27, 2023 6:27 am Built-in theme: feel free to revert back to the original one (Options -> GUI -> Theme selection). This is just default, for faster initial startup.
I think my original reason for theme switch was checkboxes initially not working. I prefer faster GUI though. IMHO you might even drop the themes from data dir (when you zip ScummVM for Atari users). That way there's less potential issues users would report to you...

mikro wrote: Mon Mar 27, 2023 6:27 am Ok, I'll move Sci::reg_t::getOffset() (and friends), Sci::getSciVersion() (this will be fun as it is a C static variable) and Scumm::ScummEngine::testGfxUsageBit() into header files. But isn't their CPU utilisation a bit lower now?

I guess I'll do the inlining above and provide a new snapshot.
Compared to first dlmalloc version results for "Xmas card 1988": viewtopic.php?p=444627#p444627

"Sci::reg_t::getOffset()" dropped from 7.7% to 7.2% and "Sci::getSciVersion()" from 5.6% to 3.9%, so there is small 13.3% -> 11.1% = 2.2.% improvement.

I would first try inlining just that trivial "Sci::getSciVersion()", as inlining everything can bloat the code quite a bit. I'm not sure how much of a win that will be with need to change the static version variable to a private "Sci" member variable lookup though (instruction prefetch works better, but data access needs to use struct offset).

One more thing that could be checked is Sci object structure, whether it's efficiently packed (no holes, items used together being packed together) for 030 and its minuscule data cache size. Or I could take a look at latter with Linux Valgrind (using ScummVM 2.2 on my PC)?

mikro wrote: Mon Mar 27, 2023 6:27 am Apart from that it'll take a while for something new because that "TT rework" (arbitrary resolution support in general) is not a small job (and as a bonus, I will have to think about a way how to optimally render into 320x480...). Most likely the 4bpp & 6bpp C2Ps will be included along the way, if easy enough; I did test it and yes, CGA and EGA versions can be used with 4 bits per pixel but since they are rendered into 640x400 they cancel the speedup: 640x400 = 4 times more data, 4bpp = half data to render, so all in all... twice as much data to render.
(Add option to) convert only every other line? Either by drawing only 640x200, or by doing fast copy to duplicate lines?
mikro wrote: Mon Mar 27, 2023 6:27 am And the 6bpp C2P suitable for Amiga games (64 colours) is also questionable, I briefly check the pixel values and they do get bigger than 63. So it's possible that Amiga versions were using hardware tricks to increase their palette size.
Ouch. :-/
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

@mikro, I looked at upstream ScummVM commits, and noticed that your commits have ended there!

However, I wonder whether the dlmalloc change: https://github.com/scummvm/scummvm/comm ... 50b253df28

upstreaming was premature, as AFAIK there's no official dlmalloc MiNTlib version yet?
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Eero Tamminen wrote: Mon Mar 27, 2023 5:13 pmupstreaming was premature, as AFAIK there's no official dlmalloc MiNTlib version yet?
Note the /opt/mintlib-dlmalloc prefix. :) I'm the maintainer so for now it's ok to use a hardcoded path. Later it will either get merged or it will be possible to configure in mintlib's configvars and built via their build service.

Check out https://mikro.naprvyraz.sk/private/scum ... 230327.zip ... this is a version with inlined ScummEngine::testGfxUsageBit, Script::offsetIsObject, getSciVersion, reg_t::getOffset() and reg_t::setOffset (all of them are quite short so there shouldn't be any bloated parts). Also with replaced usleep() and clock() calls with my custom timer c counter so you shouldn't see any Super() or _sysvar stuff in your profiling (maybe except getTimeAndDate() and its call to time() & localtime() but that shouldn't be called too often, if at all).

If you could go through your previous profilings and compare the results, that would be extremely helpful when proposing those changes upstream. :) (you don't have to go through every game, just the biggest offenders).

In the meantime I'm going to continue on that generic screen size changes.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

Went through another batch of ScummVM demo games: https://www.scummvm.org/demos/

Not recognized:
  • The Legend of Kyrandia: Book 3 - Malcolm's Revenge
  • The Realm
Engine missing:
  • Grim Fandango
Failing demos:
  • Waxworks - "WARNING: AGOS failed to instantiate engine: Game data not found" (did not tell what file was missing)
Non-failing demos (in order provided by ScummVM):
  • I Want My C64 Back! -- SCI engine (speed-wise similar to larry 2)
  • The Feeble Files -- constant "WARNING: dropped frame <X>!" warnings (did not wait for it to actually output something)
  • Fun Seeker's Guide -- just a click through joke...
  • Hoyle Classic Games -- too slow to be playable (speed-wise similar to larry 7)
  • Laura Bow I: The Colonel's Bequest -- non-interactive demo (SCI engine), slow
  •  Leisure Suit Larry 1 (remake) -- non-interactive demo (SCI engine), slow
  • Pepper's Adventures in Time -- colorful kid's game (SCI engine), very slow
  • Personal Nightmare (Atari ST) -- non-interactive, but seems slow
  • Playtoons: Living Stories -- high res, but speed is OK, and voices almost fine
  • RAMA [1] -- pics & sluggish videos (of e.g. Arthur C Clarke), probably much slower when not on GEMDOS HD
  • Slater & Charlie Go Camping -- Kids' game with "interactive pages", slow
  • Broken Sword: The Shadow of Templars
    • "PSX stream cutscene 'intro.str' cannot be played in paletted mode"
    • Slow and sound echoes horribly, but playable (unlike BS 2)
Will look a bit to perf of some of these later.

[1] running RAMA showed something very worrying:

Code: Select all

GEMDOS 0x3D Fopen("c:\scummvm\rama-dos\patches\20030.v56", read-only) at PC=0x2C6F49A
-> FD 70 (read-only -> read+write)
GEMDOS 0x3D Fopen("c:\scummvm-.sym", read-only) at PC=0x2C6F49A
-> FD 71 (read-only -> read+write)
GEMDOS 0x3E Fclose(71) at PC 0x2C6ED54
Patching SCUMMVM-.SYM failed - resource type mismatch
GEMDOS 0x3D Fopen("c:\scummvm-.ttp", read-only) at PC=0x2C6F49A
-> FD 71 (read-only -> read+write)
GEMDOS 0x3E Fclose(71) at PC 0x2C6ED54
Patching SCUMMVM-.TTP failed - resource type mismatch
=> Why game tries to modify ScummVM binary from drive root (when game itself is in few subdirs deeper)?
Last edited by Eero Tamminen on Tue Mar 28, 2023 6:45 pm, edited 2 times in total.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Mon Mar 27, 2023 7:58 pm Check out https://mikro.naprvyraz.sk/private/scum ... 230327.zip ... this is a version with inlined ScummEngine::testGfxUsageBit, Script::offsetIsObject, getSciVersion, reg_t::getOffset() and reg_t::setOffset (all of them are quite short so there shouldn't be any bloated parts). Also with replaced usleep() and clock() calls with my custom timer c counter so you shouldn't see any Super() or _sysvar stuff in your profiling (maybe except getTimeAndDate() and its call to time() & localtime() but that shouldn't be called too often, if at all).

If you could go through your previous profilings and compare the results, that would be extremely helpful when proposing those changes upstream. :) (you don't have to go through every game, just the biggest offenders).
Used the good ol' Xmas card 1988 again for comparing this, as it's only SCI engine thing with rather long period of automation and clear start & end points.

When OS is taken as single thing, it looks like this with yesterday's ScummVM version:

Code: Select all

Time spent in profile = 391.78698s.
...
Used cycles:
  13.30%   5.49%  22.53%  1671996553 6900802372832168959 * Sci::run_vm(Sci::EngineState*)
  12.83%  12.86%  16.86%  161261027916161878262119434820   Sci::GfxView::draw(Common::Rect const&, Common::Rect const&, Common::Rect const&, short, short, unsigned char, unsigned short, bool, unsigned short)
   7.22%   7.23%   9.37%   907312384 9092673051178314065   Sci::reg_t::getOffset() const
   4.66%   4.67%   5.10%   586002804 587231140 641515026   Sci::readPMachineInstruction(unsigned char const*, unsigned char&, short*)
   4.57%                   574216734                       AtariGraphicsManager::updateScreen()
   4.30%   4.31%  13.77%   540372666 5415941591731054556   Sci::SegManager::getObject(Sci::reg_t) const
   3.91%   3.92%   3.92%   491489613 492522501 492522501   Sci::getSciVersion()
   3.90%   3.91%   3.91%   490660791 491720322 491720322   Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
   3.51%                   440694950                       ROM_TOS
And with the new version:

Code: Select all

Time spent in profile = 390.86795s.
...
Used cycles:
  15.22%   6.10%  19.77%  1908626592 7648619972479752135 * Sci::run_vm(Sci::EngineState*)
  15.16%  15.20%  20.15%  190073925219056598872526929806   Sci::GfxView::draw(Common::Rect const&, Common::Rect const&, Common::Rect const&, short, short, unsigned char, unsigned short, bool, unsigned short)
   6.04%   6.06%   6.73%   757547929 759504857 843582326   Sci::readPMachineInstruction(unsigned char const*, unsigned char&, short*)
   5.65%   4.00%   6.16%   708354175 501467337 772031261 * Sci::SegManager::getObject(Sci::reg_t) const
   5.55%                   696020125                       AtariGraphicsManager::updateScreen()
   4.84%   4.85%   4.85%   607336631 608856637 608856637   Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
   3.02%   3.03%   3.03%   379087240 380046677 380046677   Sci::GfxScreen::vectorIsFillMatch(short, short, unsigned char, unsigned char, unsigned char, unsigned char, bool)
   2.59%                   324492990                       ROM_TOS
(I think 1s run-time time diff is just due to me not starting & stopping profiling exactly at the same time.)

As to time handling, instead of usleep(), clock(), get_sysvar() + ROM_TOS taking 3.73%, I see now OSystem_Atari::delayMillis() + ROM_TOS taking 3.8%. Which could be either due to there being marginally more time for busy waiting, their portion increasing (from others using less), or both.

From profile one can see run_vm() exclusive cost increasing from 5.49% to 6.10% from the inlining, but its inclusive (total) cost dropping from 22.53% to 19.77%, which looks good.

When comparing the absolute cycle counts between these runs, and not just percentages, things did not look so good though. run_vm() exclusive cost increased by 14% and inclusive (total) cost by 11%.

Looking closer, it came mostly from this call chain:

Code: Select all

run_vm()
-> GfxAnimate::kernelAnimate()
   -> GfxAnimate::drawCels()
      -> GfxPaint16::drawCel() (4x)
         -> GfxView::draw()
            -> GfxView::getMappedColor() (455x)
Cost from both of the last functions in the call-chain had noticeably increased.

However, then I noticed that all of these functions from run_vm() down to getMappedColor(), had been called 21-24% more often (in run_vm() case, 11506 instead of 9386 times).

I.e. with less time being spent on OS side and trivial functions overhead, demo had had time to do more VM rounds and animations.

When comparing the per-call timings for run_vm(), provided by Hatari profiler post-processor "-i" option, the "between-symbols", "exclusive" and "inclusive" costs were earlier:

Code: Select all

Used cycles:
  13.30%   5.49%  22.53%  52.11149s 21.50789s 88.27084s 1671996553 6900802372832168959 * Sci::run_vm(Sci::EngineState*) (0x2329f64, 0.00555s, 0.00229s, 0.00940s / call)
And now:

Code: Select all

Used cycles:
  15.22%   6.10%  19.77%  59.48659s 23.83862s 77.28699s 1908626592 7648619972479752135 * Sci::run_vm(Sci::EngineState*) (0x2330190, 0.00517s, 0.00207s, 0.00672s / call)
I.e. total run_vm() cost went from 0.00940s / call, down to 0.00672s / call, which is 28.5% improvement!

(0.00672 / 0.00940 = 0.7148)
NovaCoder
Retro freak
Retro freak
Posts: 13
Joined: Fri Apr 26, 2013 12:09 am

Re: ScummVM/Falcon060 pre-release

Post by NovaCoder »

Cool to see this project :)

As for as ScummVM optimizations, it's worth remembering that each new official release of ScummVM has more features and higher demands. For the old Amiga ports I did, best performance was probably v1.2.1 with each new Amiga port getting progressively slower. The last version I ever did was with v1.9 (SDL v1 targeted) and needed an 060 to run at and acceptable speed.

My only real advice on improving the speed of this Atari port is to limit the screen redraws, ScummVM calls 'updateScreen' whenever the cursor is moved so if you track when the screen data has actually changed independently of when the cursor moves then you can decide in your 'updateScreen' method what actually needs to be redrawn (if anything!).

Something like this:

Code: Select all

void AmigaGraphicsManager::updateScreen() {
#ifndef NDEBUG
    debug(12, "AmigaGraphicsManager::updateScreen()");
#endif

    if (_screenDirty || (_paletteDirtyEnd != 0) || _mouseDirty) {
        if (_screenDirty) {
		CopyMemQuick(_screen.getPixels(), _hwscreen->pixels, _videoMode.screenWidth * _videoMode.screenHeight);
        }

        if (_paletteDirtyEnd != 0) {
            SDL_SetColors(_hwscreen, _currentPalette + _paletteDirtyStart,
			_paletteDirtyStart,
			_paletteDirtyEnd - _paletteDirtyStart);

			// Reset.
			_paletteDirtyStart = _paletteDirtyEnd = 0;
        }


        if (_mouseCursor.visible && _mouseCursor.captured) {
            drawMouse();
        }

        SDL_Flip(_hwscreen);

        if (_mouseCursor.visible && _mouseCursor.captured) {
            undrawMouse();
        }

        // Reset.
        _screenDirty = _mouseDirty = false;
    }
}



Bonus mouse hacking code for doubled buffered displays..

Code: Select all

// Protected
void AmigaGraphicsManager::drawMouse() {
#ifndef NDEBUG
    debug(14, "AmigaGraphicsManager::drawMouse()");
    assert(_mouseCursor.surface.getPixels());
    assert(_mouseCursorMask.surface.getPixels());
#endif

    uint w = _mouseCursor.w;
    uint h = _mouseCursor.h;

    int x = (_mouseCursor.x - _mouseCursor.hotX);
    int y = (_mouseCursor.y - _mouseCursor.hotY);

    byte *mousePixels = (byte*)_mouseCursor.surface.getPixels();

    // Clip the coordinates
	if (x < 0) {
		w += x;
		mousePixels -= x;
		x = 0;
	}

	if (y < 0) {
		h += y;
        mousePixels -= (y * _mouseCursor.surface.pitch);
		y = 0;
	}

	if (w > _videoMode.hardwareWidth - x) {
		w = _videoMode.hardwareWidth - x;
	}

	if (h > _videoMode.hardwareHeight - y) {
		h = _videoMode.hardwareHeight - y;
	}

	if (w <= 0 || h <= 0) {
		// Nothing to do.
        return;
    }

    // Setup the cursor mask.
    _mouseCursorMask.x = x;
    _mouseCursorMask.y = y;
    _mouseCursorMask.w = w;
    _mouseCursorMask.h = h;

    byte *maskPixels = (byte*)_mouseCursorMask.surface.getPixels();

    // Set the starting point of the screen we will be drawing to.
    byte *screenPixels = (byte *)_hwscreen->pixels + (y * _hwscreen->pitch) + x;

	// Draw it.
    byte color;

	do {
		// Save a copy of this row before it's overwritten.
        CopyMem(screenPixels, maskPixels, w);

         for(uint c = 0; c < w; c++) {
            color = *mousePixels;

            if (color != _mouseCursor.keyColor) {
                // Set the color.
                *screenPixels = color;
            }

            // Add a column.
            mousePixels++;
            screenPixels++;
        }

        // add a row.
        maskPixels += w;
        mousePixels += (_mouseCursor.surface.pitch - w);
        screenPixels += (_videoMode.hardwareWidth - w);
	} while (--h);
}

void AmigaGraphicsManager::undrawMouse() {
#ifndef NDEBUG
    debug(14, "AmigaGraphicsManager::undrawMouse()");
#endif

    byte *src = (byte*)_mouseCursorMask.surface.getPixels();
    byte *dst = (byte *)_hwscreen->pixels + (_mouseCursorMask.y * _hwscreen->pitch) + _mouseCursorMask.x;

    for (uint i = 0; i < _mouseCursorMask.h; i++) {
        CopyMem(src, dst, _mouseCursorMask.w);

        dst += _videoMode.hardwareWidth;
        src += _mouseCursorMask.w;
    }
}
Anyway, happy coding :cheers:
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Hmm, nice, nice. I'm thinking that I could perhaps re-enable the FPS "counter" (it prints FPS to the console) so we can actually see some difference. I'll try to run "Xmas card 1988" on real hardware. So all of the inlines have basically disappeared, right?

I'm curious about Scumm, there were some serious offenders.

Hey NovaCoder, welcome! As for cursor vs. screen redraws I'm doing basically that. Somebody had another great idea: ignoring updateScreen() completely and just refreshing screen every 1/60th of second (of course, still with dirty rectangles). That way it doesn't matter what game engines do.

I have looking at your cursor drawing again on my TODO list, I use something totally different (and perhaps not so efficient). If cursor moves, I:

1. restore rectangle from the chunky buffer to the screen surface
2. draw new cursor (for C2P I first copy background to a surface and then the "transparent" cursor there and then all of it to the screen surface)
3. mark cursor coordinates for restoring

This is done for every buffer separately. Especially step two is waiting for some serious optimisation.

Btw it's not that bad when it comes to speed in present versions. For instance one pull request which was supposed to dramatically accelerate a few SCUMM games on 030 Amiga now not only has zero effect but those games are playable on basic setup without further changes (e.g. Full Throttle).
NovaCoder
Retro freak
Retro freak
Posts: 13
Joined: Fri Apr 26, 2013 12:09 am

Re: ScummVM/Falcon060 pre-release

Post by NovaCoder »

mikro wrote: Tue Mar 28, 2023 5:53 am Hey NovaCoder, welcome! As for cursor vs. screen redraws I'm doing basically that.
Cool :mrgreen:
mikro wrote: Tue Mar 28, 2023 5:53 am Somebody had another great idea: ignoring updateScreen() completely and just refreshing screen every 1/60th of second (of course, still with dirty rectangles). That way it doesn't matter what game engines do.
That sounds okay but I don't personally like updating the screen based on a tick rate, it's better to scale it to the CPU speed if possible.
mikro wrote: Tue Mar 28, 2023 5:53 am I have looking at your cursor drawing again on my TODO list, I use something totally different (and perhaps not so efficient). If cursor moves, I:

1. restore rectangle from the chunky buffer to the screen surface
2. draw new cursor (for C2P I first copy background to a surface and then the "transparent" cursor there and then all of it to the screen surface)
3. mark cursor coordinates for restoring

This is done for every buffer separately. Especially step two is waiting for some serious optimisation.
Sounds very similar to what I did :D
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Tue Mar 28, 2023 5:53 am Hmm, nice, nice. I'm thinking that I could perhaps re-enable the FPS "counter" (it prints FPS to the console) so we can actually see some difference. I'll try to run "Xmas card 1988" on real hardware.
Do you mean using NatFeats to print the FPS output? Or outputting FPS to screen, so that one can see it also on real HW?
mikro wrote: Tue Mar 28, 2023 5:53 am So all of the inlines have basically disappeared, right?
Not sure what you mean by inlines disappearing... Inlines do not have their own symbols, so they do not show up separately in profiler output, but they still generate code. Based on my profiling above, end result was clearly faster (due to compiler being able to optimize it better I assume).
mikro wrote: Tue Mar 28, 2023 5:53 am I'm curious about Scumm, there were some serious offenders.
Will try Dig next intro...
User avatar
jvas
Captain Atari
Captain Atari
Posts: 471
Joined: Fri Jan 28, 2005 4:30 pm
Location: Budapest, Hungary

Re: ScummVM/Falcon060 pre-release

Post by jvas »

NovaCoder wrote: Tue Mar 28, 2023 9:22 am
mikro wrote: Tue Mar 28, 2023 5:53 am Somebody had another great idea: ignoring updateScreen() completely and just refreshing screen every 1/60th of second (of course, still with dirty rectangles). That way it doesn't matter what game engines do.
That sounds okay but I don't personally like updating the screen based on a tick rate, it's better to scale it to the CPU speed if possible.
Or just set a flag in updateScreen() and refresh the screen in the next 1/60th of second if this flag is set (and clear the flag in the same time)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

For some reason disabling vsync (with single-buffering) from the GUI, or from scummvm.ini, does not take. When I check GUI again, it's enabled again.

Dig demo intro

Compared to: viewtopic.php?p=445008#p445008

Takes several seconds longer, maybe due to forced Vsync?

Code: Select all

Time spent in profile = 232.23518s.
...
Used cycles:
  26.73%  26.80%  28.57%  199208634519972717452128905808   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&)
  13.13%  13.16%  13.16%   978388452 980939372 980939372   Scumm::SmushDeltaBlocksDecoder::proc4WithoutFDFE(unsigned char*, unsigned char const*, int, int, int, int, short*)
  11.45%                   853342959                       rate.o
   4.94%   4.94%   4.94%   367996784 368079274 368079274   OSystem_Atari::getMillis(bool)
   3.62%                   269502522                       AtariGraphicsManager::updateScreen()
   3.05%                   227010307                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   2.79%                   207587567                       OSystem_Atari::delayMillis(unsigned int)
   2.18%   2.18%   2.18%   162285491 162687546 162687546   Scumm::ScummEngine::remapPaletteColor(int, int, int, int)
   2.04%                   152289540                       copy256
   2.01%   2.01%   2.01%   149469663 149839941 149839941   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   1.94%                   144823061                       Audio::makePacketizedRawStream(int, unsigned char)
   1.63%                   121282241                       Scumm::CharsetRendererV7::drawCharV7(unsigned char*, Common::Rect&, int, int, int, short, Scumm::TextStyleFlags, unsigned char)
   1.47%                   109338157                       c2p1x1_8_rect_start
   1.10%                    82124107                       c2p1x1_8_rect_pix16
   1.02%                    76317175                       Common::unlockMemoryPoolMutex()
   1.01%   1.01%   3.64%    75362516  75545345 271546621   Scumm::ScummEngine::resetActorBgs()
   0.86%                    63941408                       common2
   0.78%                    58162123                       c2p1x1_8_start
   0.76%   0.77%   2.50%    56949552  57092094 186409605   Scumm::ScummEngine::getResourceAddress(Scumm::ResType, unsigned short)
   0.70%   0.70%   0.71%    51830223  51962620  52885635   Scumm::ResourceManager::createResource(Scumm::ResType, unsigned short, unsigned int)
"byleRLEDecode()" is now called slightly fewer times, although intro takes slightly longer than with first dlmalloc version.

Cost per call is now:

Code: Select all

Used cycles:
  26.73%  26.80%  28.57%  62.08780s 62.24942s 66.35208s 199208634519972717452128905808   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&) (0x1550eec, 0.01340s, 0.01343s, 0.01432s / call)
When it was with the first dlmalloc version:

Code: Select all

Used cycles:
  25.97%  26.03%  27.87%  58.76783s 58.90289s 63.06484s 188556509418898986322023434498   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&) (0x1550e4c, 0.01264s, 0.01267s, 0.01356s / call)
=> RLE decoder calls are now 5-6% slower

Not sure why. It's not calling inlined functions, at least directly. There have been no changes to "engines/scumm/akos.cpp" file for months. The new timer-c timing interrupt slowing unrelated things that much sounds unlikely too.

Looking at your repo log, I started to wonder could this: https://github.com/mikrosk/scummvm/comm ... b869068f2d

Somehow affect this: https://github.com/mikrosk/scummvm/blob ... s.cpp#L483


Game play

Code: Select all

Time spent in profile = 316.86596s.
...
Used cycles:
  53.21%  53.33%  56.61%  540957737954221251605755809969   Scumm::AkosRenderer::byleRLEDecode(Scumm::BaseCostumeRenderer::ByleRLEData&)
   7.23%                   735491318                       rate.o
   4.02%                   409128470                       AtariGraphicsManager::updateScreen()
   2.43%                   246778778                       c2p1x1_8_rect_start
   1.83%                   186109032                       c2p1x1_8_rect_pix16
   1.64%                   166891978                       Scumm::IMuseDigiInternalMixer::mixBits8ConvertToStereo(unsigned char*, int, int, int, int*, int*, bool)
   1.55%                   157888166                       common2
   1.47%   1.47%   3.23%   149254298 149435106 327916851   Scumm::ScummEngine::getResourceAddress(Scumm::ResType, unsigned short)
   1.32%                   134661949                       Scumm::CharsetRendererV7::drawCharV7(unsigned char*, Common::Rect&, int, int, int, short, Scumm::TextStyleFlags, unsigned char)
   1.26%   1.26%   1.26%   127823579 128127937 128127937   Scumm::IMuseDigiInternalMixer::loop(unsigned char**, int)
   1.23%                   125355872                       Audio::makePacketizedRawStream(int, unsigned char)
   1.12%   1.12%   6.11%   113618420 113909614 621380851   Scumm::ScummEngine::resetActorBgs()
   1.12%                   113490936                       copy16
   1.00%                   101521317                       less256
   0.99%   0.99%   5.27%   100571021 100826787 535983333   Scumm::Gdi::resetBackground(int, int, int)
   0.86%   0.86%   1.28%    87487164  87671919 129947406   Scumm::debugC(int, char const*, ...)
   0.77%   0.78%   0.78%    78558854  78798105  78798105   Scumm::bompApplyShadow(int, unsigned char const*, unsigned char const*, unsigned char*, int, unsigned char, bool)
   0.68%   0.34%   0.34%    68833411  34483477  34483477 * Scumm::ResourceManager::increaseResourceCounters()
   0.64%                    64582193                       Common::unlockMemoryPoolMutex()
   0.60%   0.67%   2.11%    61343810  67756278 214018283   Scumm::ScummEngine::findResource(unsigned int, unsigned char const*)
   0.59%                    59624420                       exit
   0.58%   0.58%  57.60%    58535427  586647935856361603   Scumm::AkosRenderer::drawLimb(Scumm::Actor const*, int)
   0.58%   0.58%   0.58%    58484710  58640055  58640055   Scumm::ScummEngine::remapPaletteColor(int, int, int, int)
   0.54%   0.55%   4.44%    55400045  55514830 451171852   Scumm::ScummEngine::executeScript()
   0.50%   0.50%   0.63%    50646476  50767853  64531694   Scumm::ScummEngine::setActorRedrawFlags()
   0.49%   0.49%   3.28%    49691750  49834498 333757667   Scumm::ScummEngine::getMaskBuffer(int, int, int)
   0.41%   0.42%   0.42%    42120261  42232833  42232833   Common::DebugManager::isDebugChannelEnabled(unsigned int, bool)
   0.41%                    41989391                       Common::StackLock::lock()
   0.38%   6.51%   6.51%    38296022 661725082 661725082   _memmove.2
=> Most of time during play goes to RLE decoding.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

Profile data (from the first ScummVM dmalloc version) for the new demos I listed above...

(I will recheck the demos that used SCI engine with the latest ScummVM binary later on.)

The Feeble Files

Small part of the excruciatingly slow startup:

Code: Select all

Time spent in profile = 264.90074s.
...
Visits/calls:
  15.51%  15.51%     1262958   1262958   Video::SmallHuffmanTree::getCode(Common::BitStreamImpl<Common::BitStreamMemoryStream, unsigned int, 8, false, false>&) [clone .part.13]
  12.13%  12.13%      987763    987763   Video::BigHuffmanTree::getCode(Common::BitStreamImpl<Common::BitStreamMemoryStream, unsigned int, 8, false, false>&)
  10.61%              863890             c2p1x1_8_start
  10.61%              863845             c2p1x1_8_pix16
...
Used cycles:
  28.77%  28.88%  28.88%  244534052424549269322454926932   Video::BigHuffmanTree::getCode(Common::BitStreamImpl<Common::BitStreamMemoryStream, unsigned int, 8, false, false>&)
  19.31%  19.38%  19.38%  164101373616474869721647486972   Video::SmallHuffmanTree::getCode(Common::BitStreamImpl<Common::BitStreamMemoryStream, unsigned int, 8, false, false>&) [clone .part.13]
   9.96%   9.91%  38.38%   846707936 8420173243261894856 * Video::SmackerDecoder::SmackerVideoTrack::decodeFrame(Common::BitStreamImpl<Common::BitStreamMemoryStream, unsigned int, 8, false, false>&)
   7.81%                   663693916                       c2p1x1_8_start
   5.86%                   497734784                       c2p1x1_8_pix16
   5.69%                   483285704                       copy256
   4.52%                   384417768                       rate.o
   4.44%   4.46%  26.25%   377316216 3790577282230834800   Video::SmackerDecoder::SmackerAudioTrack::queueCompressedBuffer(unsigned char*, unsigned int, unsigned int)
   2.43%                   206630796                       _termuser
   2.18%   2.19%  10.52%   185212448 185969668 894227740   Video::SmallHuffmanTree::decodeTree(unsigned int, int)
=> Common Huffman decoding slowness

Playtoons demo

This non-interactive demo was fast, but I'm not sure whether that's representative of actual game play:

Code: Select all

Time spent in profile = 117.83578s.
...
Visits/calls:
  13.05%             4047760             c2p1x1_8_rect_start
  13.04%             4046752             c2p1x1_8_rect_pix16
...
Used cycles:
  14.79%  14.83%  15.11%   559277237 560596792 571443033   Video::CoktelDecoder::deLZ77(unsigned char*, unsigned char const*, unsigned int, unsigned int)
  13.40%                   506802160                       rate.o
  11.47%                   433826371                       c2p1x1_8_rect_start
   8.83%                   333958049                       c2p1x1_8_rect_pix16
   8.81%                   332944045                       AtariGraphicsManager::updateScreen()
   6.78%                   256176955                       _termuser
=> Screen updates and LZ77 decoding

Broken Sword: The Shadow of Templars (demo game play)
Eero Tamminen wrote: Mon Mar 27, 2023 8:21 pm
  • "PSX stream cutscene 'intro.str' cannot be played in paletted mode"
  • Slow and sound echoes horribly, but playable (unlike BS 2)
Screen updates take most of time, but "rate.o" is more expensive here than in other games:

Code: Select all

Time spent in profile = 334.60726s.
...
Visits/calls:
  12.89%             6601538             c2p1x1_8_rect_start
  12.80%             6554633             c2p1x1_8_rect_pix16
...
Used cycles:
  16.45%                  1766235398                       rate.o
  12.58%  12.62%  18.46%  135093849813544645831981566504   Sword1::Screen::drawPsxParallax(unsigned char*, unsigned short, unsigned short, unsigned short)
  10.01%  10.11%  16.40%  107480910710858914801760368574   Audio::XAStream::readBuffer(short*, int)
   8.77%   8.79%   8.79%   941903954 944022082 944022082   Sword1::Screen::decompressHIF(unsigned char*, unsigned char*)
   6.69%   6.71%   6.71%   718474474 720173589 720173589   Sword1::Screen::drawSprite(unsigned char*, unsigned short, unsigned short, unsigned short, unsigned short, unsigned short)
   6.61%                   709332022                       c2p1x1_8_rect_start
   6.60%                   708441737                       copy256
   5.06%                   543689799                       c2p1x1_8_rect_pix16
   4.25%                   456670231                       AtariGraphicsManager::updateScreen()
   3.52%   3.52%   3.52%   377577307 378406694 378406694   Sword1::Screen::blitBlockClear(unsigned short, unsigned short, unsigned char*)
   3.02%   3.02%   6.28%   323793089 324540470 674584489   Common::MemoryReadStream::read(void*, unsigned int)
   2.85%   2.86%   2.86%   306312710 307063590 307063590   Sword1::Screen::fastShrink(unsigned char*, unsigned int, unsigned int, unsigned int, unsigned char*)
   1.76%                   189278851                       set256
   1.48%                   158895380                       common2
   0.77%                    82669352                       copy16
   0.76%   4.86%  20.33%    82011213 5220927282183020710   Sword1::Screen::updateScreen()
   0.51%   0.51%   1.09%    54647832  54767349 116939199   Sword1::Logic::interpretScript(Sword1::Object*, int, Sword1::Header*, int, int)
   0.47%  11.20%  11.20%    5037069212024859321202485932   _memmove.2
You do not have the required permissions to view the files attached to this post.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

(First dlmalloc ScummVM version still used here, so clock checks are still going to OS.)

Personal Nightmare (non-interactive)

This is Atari ST game, so its data is in bit-plane format. However, ScummVM converts it into chunky format (using AGOS engine Amiga function), before backend converts it again back to Atari format.

While demo is on average busy waiting a lot, there are places where it does larger updates, and then doing extra conversions matters:

Code: Select all

Time spent in profile = 123.15018s.
...
Used cycles:
  17.72%                   700231604                       _termuser -- Super() call for checking 200hz counter
  10.62%  10.64%  10.64%   419505415 420501940 420501940   AGOS::bitplaneToChunky(unsigned short*, unsigned char, unsigned char*&)
   7.84%                   309808759                       _enter
   5.03%                   198905088                       AGOS::AGOSEngine::drawVertImageUncompressed(AGOS::VC10_state*)
   4.73%                   186881862                       AtariGraphicsManager::updateScreen()
   4.67%   6.35%  36.71%   184470938 2510671411450369809   Common::EventDispatcher::dispatch()
   3.91%  33.97%  33.97%   15430091913422127581342315660   _get_sysvar
   3.02%                   119366982                       c2p1x1_8_start
   2.94%                   116338368                       CARTRIDGE
   2.77%                   109321558                       .get_usermode
   2.42%   2.47%  97.94%    95747872  976031323869845134   AGOS::AGOSEngine::delay(unsigned int)
   2.40%                    94845970                       c2p1x1_8_pix16
   2.33%                    91897214                       rate.o
   2.03%   2.04%  14.20%    80286359  80473441 560999658   AGOS::AGOSEngine::convertAmigaImage(AGOS::VC10_state*, bool)
   1.88%   1.88%  18.32%    74211267  74375324 723751095   AtariEventSource::pollEvent(Common::Event&)
   1.75%   1.75%   1.75%    69168013  69292797  69292797   AGOS::AGOSEngine::drawBackGroundImage(AGOS::VC10_state*)
   1.65%   1.65%  35.62%    65070856  652973741407515288   __clock
   1.52%   0.21%   4.44%    60079339   8393187 175255023 * OSystem_Atari::getMillis(bool)
   1.43%                    56670249                       _xbiostrap
   1.15%                    45277458                       _buffptr
   0.90%                    35424539                       _usleep
   0.87%                    34374860                       events.o.7
   0.86%   2.17%  24.99%    33947288  85774367 987228665   virtual thunk to OSystem_Atari::getMillis(bool)
   0.82%   0.82%  42.87%    32385742  325179491694005479   DefaultEventManager::pollEvent(Common::Event&)
   0.77%                    30519282                       .return.2
   0.77%   0.77%   5.67%    30282557  30347079 224139363   Common::VirtualMouse::pollEvent(Common::Event&)
Call-graph for this is uncommonly straight forward looking.
You do not have the required permissions to view the files attached to this post.
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Eero, it's possible that I've messed something up with the cursor setting. So even if you set in scummvm.ini "single" and vsync off, it's overriden?

The goal was to always override vsync for double buffering (always on) and triple buffer (always off) and keep the user's choice for the rest (supervidel's direct buffering and single buffering). I'll take a look today; it makes sense to compare profile dumps with the same setup.

That RLE fix is related to the splash screen so in case it wouldn't work properly, you'd have noticed. So I don't think those are related. Better retest after the vsync fix, maybe those are related.
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3108
Joined: Sun Jul 31, 2011 1:11 pm

Re: ScummVM/Falcon060 pre-release

Post by Eero Tamminen »

mikro wrote: Wed Mar 29, 2023 8:54 am Eero, it's possible that I've messed something up with the cursor setting. So even if you set in scummvm.ini "single" and vsync off, it's overriden?
If I set it false from scummvm.ini directly, or toggle it off from the GUI, and then check GUI again, it's shown as selected.

Also if I set it false from scummvm.ini directly, start ScummVM and just run the currently selected game, ScummVM still overrides the value in scummvm.ini back to true.

All this happens when "gfx_mode=single".
mikro
Hardware Guru
Hardware Guru
Posts: 3302
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: ScummVM/Falcon060 pre-release

Post by mikro »

Another batch: https://mikro.naprvyraz.sk/private/scum ... 230329.zip

This archive contains both 2.6.1 (compiled with dlmalloc & no asserts & fps counter) as well as latest master (compiled with dlmalloc & no asserts & inline changes & fps counter). The FPS counter is pretty pathetic, it prints to console every second an average of all measured FPS during that second and reset. But it's a good compromise between spamming the console and having something useful. You can use it when measuring some automated sequences for best results.

However. There is one odd thing which either proves that I have done something incredible or that I'm an idiot (I prefer to believe the latter). When measuring FPS in Full Throttle (both demo and full version), I can see that the FPS has increased by good 30%, sometimes even more. I tried single buffering + vsync off as well as double buffering + (implicit) vsync on. I could believe it in case of the latter (fullscreen updates instead of rectangle based) but the former... no clue, no clue at all.

Anyway, the vsync bug is fixed (was way more complex than it looked) and I *think* I've found the rare issue with wrong dirty rectangles.

Oh and I have tested the above-mentioned FPS also on real hardware, so it's not something related to Aranym. And while doing so...
Eero Tamminen wrote: Sun Mar 26, 2023 7:41 pm Could you measure the time e.g. with a stopwatch? That way we'll proper comparison how the time from title screen to game start compares between real machine and emulated 32Mhz Falcon (GEMDOS HD: ~20s, IDE image: 6-7min, floppy image: hours...).
So... from the credits screen to the "Load" image ... 1m06s on 68060 @ 66 MHz, fast IDE DOM, FreeMiNT, big disk cache, single buffer, vsync off (latest master). From the "Load" image to the first scene ... 2m20s (!)
Eero Tamminen wrote: Sun Mar 26, 2023 10:24 pmInvoking overlay dialog in Full Throttle requires now Ctrl-F5. And as they are 8-bpp, colors change for game gfx on bg while overlay is open, which was initially slightly distracting. I much prefer things remaining in same video mode though!
That's really nice but unfortunately it is not so simple. :) Before that the overlay meant switching to hicolour (320x240), so the chunky buffer was converted from the indices + palette to 16-bit hicolour pixels. Now, as the overlay is 8-bit, I can't do this anymore. The GUI internally requires a chunky based pixels with no palette (for the blending effects). So one guy came up with an idea to create a RGB332 pixels for it. So now, every time the overlay is active, pixels in the chunky buffer are converted on the fly from the indices + palette into RGB332 (which uses its own palette, a slightly darker one). Since there's only 3 + 3 + 2 bits for each pixel's colour instead of 8 + 8 + 8 (24-bit palette!), this is what do you observe, it is a loss of colour information.

I'm going to check the Gabriel Knight's cursor bug now...

EDIT: and I did, both demo and full version, looks good to me. Hopefully fixed by that cursor bugfix.
Post Reply

Return to “News & Announcements”