https://dl.dropboxusercontent.com/u/129 ... fext2.diff
The reason for this was a need to inspect instruction-level timings for optimization purposes. This is a bit different from the intended use case for the Hatari profiler, which generates a compact format subsequently digested by a python script, which in turn produces a classic, high-level profile view of program activity (i.e. function-level and callgraph profiling). This is very useful for large programs with lots of bits of code and complex flow control.
The original 'raw' format is however a bit lacking in detail for using the profile data directly and spot-optimizing instruction sequences, so I made some changes to make this work better for me.
What extra info does the patch show?
- more precise measurements, to 5 decimals (versus 2)
- shows execution cycles as a %age of total runtime (versus total hits as a %age - not quite the same)
- shows external access penalities (sometimes unpredictable cost caused by conflict when accessing external dsp ram more than once per op)
- shows average cycle counts and average penalties over profiling session (instead of just the last cost measured)
- shows ops with varying execution time
- fields prefixed with <columncode>: to make scanning easier for custom import to Excel or with python
DISCLAIMER: my changes are incompatible with the hatari_profile.py and hatari_spinloop.py scripts - so you can't generate normal callgraphs or profiled function lists with my patch applied. All you can do is look at instructions and their costs directly. If you want to use both kinds of view, you'll need to build two versions of Hatari, as I did. I don't use this special version very often but it has turned out to be very useful for some kinds of optimization work.
An example of the difference in output before/after the patch:
Code: Select all
before (unpatched version):
p:0271 5eec79 (06 cyc) tfr y1,b y:(r4+n4),a 0.04% (130576, 783456, 0)
after (patched version):
p:0267 4dec45 (6:2) cmp x0,a y:(r4+n4),x1 u:00.04252% c:00.05813% (U: 52102 avC:6.00 avE:2.00 V:0)
A typical profile packet in the new format looks like this:
<address> <opwords> (X:Y) <opcode> u:00.00189% c:00.00258% (U: 2315 avC:6.00 avE:0.00 V:0)
u: instruction usage as a percentage of total profiled instructions
c: cycle usage as a percentage of total profiled time
U: actual (absolute) usage for this instruction (total hits)
avC: measured average cycles for this instruction
avE: measured average external memory penalty for this instruction
V: peak variance seen in terms of cycles (e.g. 2 = execution time varied by 2 cycles worst case during profile run) note that this can also cause non-integer averaged cycle measurements for any instruction.
X: cycles taken by the *last* visit to this instruction
Y: external penalty cycles taken by the *last* visit to this instruction