mikro wrote: ↑Mon Mar 27, 2023 7:58 pm
Check out
https://mikro.naprvyraz.sk/private/scum ... 230327.zip ... this is a version with inlined ScummEngine::testGfxUsageBit, Script::offsetIsObject, getSciVersion, reg_t::getOffset() and reg_t::setOffset (all of them are quite short so there shouldn't be any bloated parts). Also with replaced usleep() and clock() calls with my custom timer c counter so you shouldn't see any Super() or _sysvar stuff in your profiling (maybe except getTimeAndDate() and its call to time() & localtime() but that shouldn't be called too often, if at all).
If you could go through your previous profilings and compare the results, that would be extremely helpful when proposing those changes upstream.

(you don't have to go through every game, just the biggest offenders).
Used the good ol' Xmas card 1988 again for comparing this, as it's only SCI engine thing with rather long period of automation
and clear start & end points.
When OS is taken as single thing, it looks like this with yesterday's ScummVM version:
Code: Select all
Time spent in profile = 391.78698s.
...
Used cycles:
13.30% 5.49% 22.53% 1671996553 6900802372832168959 * Sci::run_vm(Sci::EngineState*)
12.83% 12.86% 16.86% 161261027916161878262119434820 Sci::GfxView::draw(Common::Rect const&, Common::Rect const&, Common::Rect const&, short, short, unsigned char, unsigned short, bool, unsigned short)
7.22% 7.23% 9.37% 907312384 9092673051178314065 Sci::reg_t::getOffset() const
4.66% 4.67% 5.10% 586002804 587231140 641515026 Sci::readPMachineInstruction(unsigned char const*, unsigned char&, short*)
4.57% 574216734 AtariGraphicsManager::updateScreen()
4.30% 4.31% 13.77% 540372666 5415941591731054556 Sci::SegManager::getObject(Sci::reg_t) const
3.91% 3.92% 3.92% 491489613 492522501 492522501 Sci::getSciVersion()
3.90% 3.91% 3.91% 490660791 491720322 491720322 Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
3.51% 440694950 ROM_TOS
And with the new version:
Code: Select all
Time spent in profile = 390.86795s.
...
Used cycles:
15.22% 6.10% 19.77% 1908626592 7648619972479752135 * Sci::run_vm(Sci::EngineState*)
15.16% 15.20% 20.15% 190073925219056598872526929806 Sci::GfxView::draw(Common::Rect const&, Common::Rect const&, Common::Rect const&, short, short, unsigned char, unsigned short, bool, unsigned short)
6.04% 6.06% 6.73% 757547929 759504857 843582326 Sci::readPMachineInstruction(unsigned char const*, unsigned char&, short*)
5.65% 4.00% 6.16% 708354175 501467337 772031261 * Sci::SegManager::getObject(Sci::reg_t) const
5.55% 696020125 AtariGraphicsManager::updateScreen()
4.84% 4.85% 4.85% 607336631 608856637 608856637 Sci::GfxView::getMappedColor(unsigned char, unsigned short, Sci::Palette const*, int, int)
3.02% 3.03% 3.03% 379087240 380046677 380046677 Sci::GfxScreen::vectorIsFillMatch(short, short, unsigned char, unsigned char, unsigned char, unsigned char, bool)
2.59% 324492990 ROM_TOS
(I think 1s run-time time diff is just due to me not starting & stopping profiling exactly at the same time.)
As to time handling, instead of usleep(), clock(), get_sysvar() + ROM_TOS taking 3.73%, I see now OSystem_Atari::delayMillis() + ROM_TOS taking 3.8%. Which could be either due to there being marginally more time for busy waiting, their portion increasing (from others using less), or both.
From profile one can see run_vm() exclusive cost increasing from 5.49% to 6.10% from the inlining, but its inclusive (total) cost dropping from 22.53% to 19.77%, which looks good.
When comparing the absolute cycle counts between these runs, and not just percentages, things did not look so good though. run_vm() exclusive cost increased by 14%
and inclusive (total) cost by 11%.
Looking closer, it came mostly from this call chain:
Code: Select all
run_vm()
-> GfxAnimate::kernelAnimate()
-> GfxAnimate::drawCels()
-> GfxPaint16::drawCel() (4x)
-> GfxView::draw()
-> GfxView::getMappedColor() (455x)
Cost from both of the last functions in the call-chain had noticeably increased.
However, then I noticed that all of these functions from run_vm() down to getMappedColor(),
had been called 21-24% more often (in run_vm() case, 11506 instead of 9386 times).
I.e. with less time being spent on OS side and trivial functions overhead, demo had had time to do more VM rounds and animations.
When comparing the per-call timings for run_vm(), provided by Hatari profiler post-processor "-i" option, the "between-symbols", "exclusive" and "inclusive" costs were earlier:
Code: Select all
Used cycles:
13.30% 5.49% 22.53% 52.11149s 21.50789s 88.27084s 1671996553 6900802372832168959 * Sci::run_vm(Sci::EngineState*) (0x2329f64, 0.00555s, 0.00229s, 0.00940s / call)
And now:
Code: Select all
Used cycles:
15.22% 6.10% 19.77% 59.48659s 23.83862s 77.28699s 1908626592 7648619972479752135 * Sci::run_vm(Sci::EngineState*) (0x2330190, 0.00517s, 0.00207s, 0.00672s / call)
I.e. total run_vm() cost went from 0.00940s / call, down to 0.00672s / call, which is
28.5% improvement!
(0.00672 / 0.00940 = 0.7148)