C compiler benchmarking

C and PASCAL (or any other high-level languages) in here please

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

C compiler benchmarking

Post by czietz »

Sometimes, I read statements such as “C compiler x is always superior to y”. But I like to my make my choices based on hard data. I chose the industry-standard CoreMark benchmark for this. Instead of benchmarking different machines (as I did elsewhere), I built CoreMark with different compilers and ran it on the same machine (ST @ 8 MHz). Since CoreMark is designed to contain many algorithmic “building blocks” commonly found in real-world applications (read the whitepaper if you’re interested), its performance imho is a good indication of the overall quality of the compiler’s code generation.

Here is the number of iterations per second, achieved with different compilers. Higher is better:

Code: Select all

Pure C 1.1       :  0.81
vbcc 0.9hp2 (-O3):  1.19
gcc 2.9.5   (-O2):  1.28
gcc 4.6.4   (-O2):  1.72
gcc 8.2.1   (-O2):  1.92
gcc 9.3.1   (-O2):  1.92
The best version is more than twice as fast as the slowest version. This shows that it’s important to compare compilers. But by all means, gather your own data; benchmark your specific application.

PS: The vbcc version was built with “-O3” for a reason. You see, CoreMark checks its results and it reports an error when I build it with vbcc 0.9h and “-O2”. There must be bug in the code generation. Not very trustworthy, imho.
medmed
Atari Super Hero
Atari Super Hero
Posts: 692
Joined: Sat Apr 02, 2011 5:06 am
Location: France, Paris

Re: C compiler benchmarking

Post by medmed »

Hi,

That means gcc 9 compiled app should be "theoretically" ~20% faster than gcc 4 compiled app?
M.Medour - 1040STF, Mega STE + Spektrum card, Milan 040 + S3Video + ES1371.
User avatar
Orion_
Atari Super Hero
Atari Super Hero
Posts: 552
Joined: Sat Jan 10, 2004 12:20 pm
Location: France
Contact:

Re: C compiler benchmarking

Post by Orion_ »

that is what I noticed, I used VBCC for years because it was easier to setup and could interface easily with ASM code using fastcall parameters in registers, but then I switched to gcc8 and later gcc9 and the same code was about 2 times faster than with VBCC.

the only struggle is switching from this kind of call (VBCC)

Code: Select all

void	ConvertPI1PaletteToFalconPalette(__reg("a0") void *pi1pal, __reg("a1") void *falconpal, __reg("d0") u16 ncolors);
to this more complex call (gcc)

Code: Select all

static inline void ConvertPI1PaletteToFalconPalette(void* pi1pal, void* falconpal, u16 ncolors)
{
	register void* _pi1pal asm("a0") = pi1pal;
	register void* _falconpal asm("a1") = falconpal;
	register u16 _ncolors asm("d0") = ncolors;
	asm volatile ("jsr	_ConvertPI1PaletteToFalconPalette" : "+r"(_pi1pal), "+r"(_falconpal), "+r"(_ncolors) : "r"(_pi1pal), "r"(_falconpal), "r"(_ncolors) : "d1", "d2", "d3", "a2", "cc", "memory");
}
but at least it's even faster because you don't have to save the used registers, you just tell gcc which registers were trashed by the asm routine, and it will optimize the calling code by itself
czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

Re: C compiler benchmarking

Post by czietz »

medmed wrote: Sat Sep 03, 2022 2:56 pm That means gcc 9 compiled app should be "theoretically" ~20% faster than gcc 4 compiled app?
Exact numbers will depend on the exact application, of course. That's why I recommended doing own benchmarks. But often, software will visibly benefit from the improved optimization in the (architecture-independent) frontend, yes.
User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 921
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: C compiler benchmarking

Post by Anima »

Interesting comparison. Thanks for the numbers. :cheers:
medmed
Atari Super Hero
Atari Super Hero
Posts: 692
Joined: Sat Apr 02, 2011 5:06 am
Location: France, Paris

Re: C compiler benchmarking

Post by medmed »

Many thanks guys - I'll try myself asap.
M.Medour - 1040STF, Mega STE + Spektrum card, Milan 040 + S3Video + ES1371.
uko
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 118
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: C compiler benchmarking

Post by uko »

I didn't know your benchmarking of machines, it is very interesting too !!
David aka Uko, from T.AL
Take a look at our last STe demo ! The Star Wars Demo and to its "making of"
https://github.com/Uko-TAL
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Code: Select all

gcc 4.6.4 (mfastcall): 1.80
gcc 7.5 (-O2):         1.90
gcc 10.4 (-O2):        1.92
(measured with Hatari, but i think for 68k that should be quite accurate)

I'm a bit buffled that Pure-C compares so badly, but maybe thats because a) Pure-C does not do any inlining and b) the benchmark does not do much function calls.

Also it might be worthwhile to check how much time is spend in the matrix functions. Thats IMHO something that you don't commonly find in applications.
czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

Re: C compiler benchmarking

Post by czietz »

ThorstenOtto wrote: Sun Sep 04, 2022 6:56 am Also it might be worthwhile to check how much time is spend in the matrix functions. Thats IMHO something that you don't commonly find in applications.
You would be surprised how many problems actually are actually solved by linear algebra (such as matrix multiplications): graphics and games engines, control systems, simulations, ... Plus - in the end - this is simply a benchmark for array accesses and integer math; and surely you'll agree that array accesses and integer math is something commonly found in applications.
pixelpusher
Retro freak
Retro freak
Posts: 11
Joined: Sat Sep 25, 2021 1:59 pm

Re: C compiler benchmarking

Post by pixelpusher »

ThorstenOtto wrote: Sun Sep 04, 2022 6:56 am

Code: Select all

gcc 4.6.4 (mfastcall): 1.80
gcc 7.5 (-O2):         1.90
gcc 10.4 (-O2):        1.92
(measured with Hatari, but i think for 68k that should be quite accurate)

I'm a bit buffled that Pure-C compares so badly, but maybe thats because a) Pure-C does not do any inlining and b) the benchmark does not do much function calls.

Also it might be worthwhile to check how much time is spend in the matrix functions. Thats IMHO something that you don't commonly find in applications.
The way you could use PureC effectively (in the 90s) was by using it like a big macro assembler. Consider its register usage pattern (once it started using stack vars due to exhausting the registers, its performance degraded), do loop unrolling on your own, [if performance matters then] use floating point only, if you can utilize a fpu (and are on a 68020 or higher). Otherwise utilize fixed point arithmetics.
joska
Hardware Guru
Hardware Guru
Posts: 5717
Joined: Tue Oct 30, 2007 2:55 pm
Location: Florø, Norway
Contact:

Re: C compiler benchmarking

Post by joska »

It would be interesting to do the same tests on 040 and 060 too. I would expect the difference to be even more dramatic considering that the last version of Pure C predates any 040/060 TOS clone.
Jo Even

VanillaMiNT - Falcon060 - Milan060 - Falcon040 - MIST - Mega STE - Mega ST - STM - STE - Amiga 600 - Sharp MZ700 - MSX - Amstrad CPC - C64
User avatar
AdamK
Captain Atari
Captain Atari
Posts: 443
Joined: Wed Aug 21, 2013 8:44 am

Re: C compiler benchmarking

Post by AdamK »

It would also be intresting to see clang in the mix.
Atari: FireBee, Falcon030 + CT60e + SuperVidel + SvEthlana, TT, 520ST + 4MB ST RAM + 8MB TT RAM + CosmosEx + SC1435, 1040STFM + UltraSatan + SM124, 1040STE 4MB ST RAM + 8MB TT RAM + CosmosEx + NetUSBee + SM144 + SC1224, 65XE + U1MB + VBXE + SIDE2, Jaguar, Lynx II, 2 x Portfolio (HPC-006)

Adam Klobukowski [adamklobukowski@gmail.com]
SteveBagley
Captain Atari
Captain Atari
Posts: 286
Joined: Mon Jan 21, 2013 9:31 am

Re: C compiler benchmarking

Post by SteveBagley »

And Lattice C 5 :)

Steve
czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

Re: C compiler benchmarking

Post by czietz »

Well, CoreMark is open-source: https://github.com/eembc/coremark/ (or here for my Atari/gcc port: https://github.com/czietz/coremark/). By all means, benchmark all the C compilers you can find. ;)
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Code: Select all

SozobonX (32bit ints):  1.46
SozobonX (16bit ints):  1.49
Quite surprising. Better than gcc 2.95, and also better than vbcc. And i definitely have to figure out whats wrong with Pure-C.

BTW, i could not reproduce your problem with using -O2 for vbcc. i Also only get 0.66 Iterations/sec with that compiler (i'm using version 0.9g, have to check whether there are any optimizations made only in 0.9h)
Last edited by ThorstenOtto on Tue Sep 06, 2022 12:43 pm, edited 2 times in total.
User avatar
metalages
Captain Atari
Captain Atari
Posts: 422
Joined: Thu Jun 06, 2013 5:14 pm
Location: France
Contact:

Re: C compiler benchmarking

Post by metalages »

In case it can help...
Compiling without any standard lib i have noticed sometimes pure c generates useless ldiv calls to manage pointer arithmetic instead of right shifting in some simple cases.
czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

Re: C compiler benchmarking

Post by czietz »

ThorstenOtto wrote: Tue Sep 06, 2022 12:09 pm BTW, i could not reproduce your problem with using -O2 for vbcc. i Also only get 0.66 Iterations/sec with that compiler (i'm using version 0.9g, have to check whether there are any optimizations made only in 0.9h)
Different compiler version, maybe? I use the latest Windows binary release from the VBCC website. You can find both my version with -O2 (which is broken!) and with "-O3" attached.
You do not have the required permissions to view the files attached to this post.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Definitely different compiler versions, i use 0.9g on linux. Did you compile coremark for 16bit ints or for 32bit? I just tried to recompile for 16bit, but that seems to hang.
User avatar
masteries
Atari Super Hero
Atari Super Hero
Posts: 506
Joined: Thu Jul 16, 2015 4:05 pm

Re: C compiler benchmarking

Post by masteries »

Very interesting the results using GCC 8 and 9

Thanks for the benchmark!
The inner mastery...

Metal Slug for Atari STE: https://www.youtube.com/watch?v=FMrdjrrtxWo
https://www.youtube.com/watch?v=hgW6Fc5Jli0

Low Cost Hard Disk for Atari ST/E (now it reaches 1 MB/s reading and 700 KB/s writing):
viewtopic.php?f=33&t=40018
https://www.youtube.com/watch?v=Qn9IwKo-EoA
czietz
Hardware Guru
Hardware Guru
Posts: 2514
Joined: Tue May 24, 2016 6:47 pm

Re: C compiler benchmarking

Post by czietz »

For 32 bit ints, using the "tos" config file. In any case, I didn't want to make this thread a VBCC debugging session ;)
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3704
Joined: Sun Jul 31, 2011 1:11 pm

Re: C compiler benchmarking

Post by Eero Tamminen »

ThorstenOtto wrote: Tue Sep 06, 2022 12:09 pm

Code: Select all

SozobonX (32bit ints):  1.46
SozobonX (16bit ints):  1.49
Quite surprising. Better than gcc 2.95, and also better than vbcc.
SozobonX results are really interesting. I used that quite a lot in 90's . Wasn't original ST-GUIDE compiled with it?
ThorstenOtto wrote: Tue Sep 06, 2022 12:09 pm And i definitely have to figure out whats wrong with Pure-C.
Building binaries with symbols, profiling the resulting binaries with Hatari (= "profile on" + setting breakpoints on benchmarking start & end), running post-processor on the saved profile ("profile save profile.txt"), and comparing the outputs, tells whether problem is with some specific functions, or all over the code.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Yes, sorry. But something is strange here. I recompiled everything again, only changing portme.h to report the correct compiler version, now i suddenly only get 0.26 Iterations/sec.... dunno what's happening

Anyway, other results:

Code: Select all

Lattice 5.06 (32bit ints) 1.06
Lattice 5.06 (16bit ints) 0.85
Lattice 5.60 (32bit ints) 1.21 (validation failed!!, without optimization it works)
Lattice 5.60 (16bit ints) 1.05 (validation failed!!, without optimization it works)
Maybe you could put all this results on your wiki page, if you cannot edit your original post?

PS.: the errors i get when compiling with lattice 5.60 seem to be similar than the ones produced by your corevbo2.ttp.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Eero Tamminen wrote: Tue Sep 06, 2022 2:39 pm Wasn't original ST-GUIDE compiled with it?
Very likely, yes. ST-Guide is by Holger Weets, who also did quite some work on SozobonX.
ThorstenOtto
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3069
Joined: Sun Aug 03, 2014 5:54 pm

Re: C compiler benchmarking

Post by ThorstenOtto »

Eero Tamminen wrote: Tue Sep 06, 2022 2:39 pm Building binaries with symbols, profiling the resulting binaries with Hatari (= "profile on" + setting breakpoints on benchmarking start & end), running post-processor on the saved profile ("profile save profile.txt"), and comparing the outputs, tells whether problem is with some specific functions, or all over the code.
The actual benchmark loop is here i think: https://github.com/eembc/coremark/blob/ ... c#L265-284

However, the Iterate() function is also called earlier, to calibrate the number of iterations needed. Any idea how to profile only the time used for the actual loop?
User avatar
Eero Tamminen
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3704
Joined: Sun Jul 31, 2011 1:11 pm

Re: C compiler benchmarking

Post by Eero Tamminen »

ThorstenOtto wrote: Tue Sep 06, 2022 6:15 pm The actual benchmark loop is here i think: https://github.com/eembc/coremark/blob/ ... c#L265-284

However, the Iterate() function is also called earlier, to calibrate the number of iterations needed. Any idea how to profile only the time used for the actual loop?
As you're building it yourself, you could just add a label to the lines where you want the breakpoints.

(If GCC does not include symbols for unused labels to the binary for some reason, add trivial non-local functions that are called at start and end of the part you're interested about.)
Post Reply

Return to “C / PASCAL etc.”