gcc calling convention

C and PASCAL (or any other high-level languages) in here please

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

gcc calling convention

Postby simonsunnyboy » Sun Nov 03, 2013 3:34 pm

Today I learned that gcc passes parameter on the stack (which is ok for clean applications). Is it possible to change to the classic register passing model like Pure C/AHCC?

Atm i'm tempted to switch to gcc but the thing above is a killer argument, as I already have a codebase that needs porting and I don't really like the loss of even more cycles.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: gcc calling convention

Postby dml » Sun Nov 03, 2013 6:07 pm

simonsunnyboy wrote:Today I learned that gcc passes parameter on the stack (which is ok for clean applications). Is it possible to change to the classic register passing model like Pure C/AHCC?

Atm i'm tempted to switch to gcc but the thing above is a killer argument, as I already have a codebase that needs porting and I don't really like the loss of even more cycles.


If you're porting old code this can be a problem if the old code is poorly laid out. If you're writing new code or the project is in good shape, the compiler is good at inlining, unrolling and other more global optimizations so most of the high performance functions won't be called but merged/fused in-context. The low performance, large functions will be left instanced, but the overhead of passing args to those disappears into the cost of the function anyway. It's the small, frequent ones that bite and those are the ones to be inlined.

Failing that, you could try the GCC attributes. I haven't checked that it is supported on the m68k target but it's worth a try.

The relevant attribute is:

__attribute__((fastcall))

Place it in front of your function declaration. It should be the declaration (header) not just the definition since the calling code needs to see it.

User avatar
mfro
Atari Super Hero
Atari Super Hero
Posts: 808
Joined: Thu Aug 02, 2012 10:33 am
Location: SW Germany

Re: gcc calling convention

Postby mfro » Sun Nov 03, 2013 6:39 pm

dml wrote:Failing that, you could try the GCC attributes. I haven't checked that it is supported on the m68k target but it's worth a try.


I'm afraid the fastcall attribute is not implemented for m68k but also think - as you have already explained - it's not necessary anyway.

Short functions get properly inlined, for large functions the calling overhead is neglectable.

Not having the choice of different ABIs saves you a lot of headaches, though ;).

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: gcc calling convention

Postby dml » Sun Nov 03, 2013 10:45 pm

mfro wrote:I'm afraid the fastcall attribute is not implemented for m68k


Oh well, now we know. I don't rely on it so I wasn't sure - some of those attributes are sparsely supported on the various targets.

mfro wrote:but also think - as you have already explained - it's not necessary anyway.
Short functions get properly inlined, for large functions the calling overhead is neglectable.
Not having the choice of different ABIs saves you a lot of headaches, though ;).


Indeed all true. I was concerned at one point that m68k gcc passes all args as longwords regardless of size, but in fact it's just an alignment thing. It still uses word access for short args - the gaps are unused padding, at least on plain m68k (Maybe not so on 020+)


To simonsunnyboy... here are a few tips to help avoid the cost of arg passing. By tips I mean these methods work... but be aware that picking the right one at the right time will help you avoid making a large scale mess of your program. Optimization is always best done by forward planning, design and choice of solutions - anything not covered by that gets done last of all.

Some C tips:

1) Declaring a function 'static' tells the compiler the code can't be seen/used from outside that sourcefile and is therefore categorized as 'anything goes' for optimization purposes. The compiler *will* use that information and do radical things to the code. It's your #1 optimization hint for functions. It can make arg passing 'disappear'.

2) Not delcaring a function 'static' (the more common case) means the compiler has to treat it as a public export which can be seen and used from other sourcefiles. It therefore has to instance the code as a real function and use the standard arg passing convention (stack) to reach it. This tends to be implicit because such functions can't live in a header file (due to symbol collisions they would cause), so chances of inlining are limited anyway. But....

3) If you declare a function 'static' it will not 'collide' with other functions with the same name in other sourcefiles, since it's effectively private to the sourcefile. This means you can stick it in a header file and share it across a whole project, without getting duplicate symbol problems at linktime.

4) Declaring a function 'static inline' does the same as 'static', but more likely to encourage inlining, and discourage instancing (making the function 'real' and callable). This in turn means you can put complete functions in header files (sometimes called 'implementation' headers) which optimize/fuse into other code very well without any arg passing involved. Watch out because 'inline' keyword didn't exist in ANSI C and arrived later. If using an older dialect, it may exist as an extension instead e.g. __inline. It became official later on and with C++ (minus the underscores).

5) If you try to take the address of such a static/inline function (say, to make a function pointer out of it), this may not prevent it from inlining in other locations - but it will force it to be instanced in that sourcefile to produce a valid address. This means the code you call through your function pointer isn't the code that's being used when the function is called directly (even if they do the same job). While this is normally not a problem, it could get very confusing when taking into account (***) below.


Some C++ tips

6) You can fully define a function within a class, and it gets treated with inline semantics by default, with or without the keyword. It also doesn't cause symbol collisions because it's been defined as part of the class. You can therefore implement whole classes fully within header files and share them, and they are likely to inline in most circumstances (this is not necessarily appropriate, but it is nonetheless true).

7) Defining a function 'static' in a C++ class means it doesn't need any state from the class, and that's one less pointer needed when calling it. However you don't really want to design a class around static functions. It has specific purposes - just be aware it implicitly costs the same or less than a normal class method depending on the compiler.

8) C++ has awesome code generation/expansion capabilities via templates - a kind of 'super macro' language wrapped around/into C. It's hard to learn the depths of it, but the performance possible from meta-programs written in terms of templates is unbeatable because such programs can calculate a lot of stuff during compilation, leaving only the unknown, dynamic work to the runtime. The biggest downside is the awful spaghetti error messages which result from the simplest syntax foibles. But you do get used to it :)


(***) Be very, very careful with 'static inline' when using C++. You can also declare variables static, which means the variable is shared and persistent across any calls using it (effectively a local variable that behaves like a global variable). Using these within a 'static inline' function is usually disastrous, because it can easily create multiple (fused) instances of the function across the program, and therefore also the variable, each with its own state. Since you don't know how many instances of the inline function get created in the program, you can't tell how many copies of the variable there are either. Don't mix static vars with static funcs unless you really mean to, and probably don't mix them with static inline funcs, ever. Even if it seems like it should work - recommend avoid.


These tips were limited to optimizing how code is defined / instanced between modules and calling overhead. It's not meant to be an exhaustive list of optimizations - just things that affect argument passing.

Have fun.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: gcc calling convention

Postby simonsunnyboy » Mon Nov 04, 2013 6:44 am

Thanks for your comments, my code base consists of assembly language subroutines mostly. It is not the C part that needs optimizing, and I'm particularly aware of static and things, Embedded C is my profession for a living....rewriting stuffs in C and hoping gcc will optimize the code good enough on 000 and 030 targets is a burden of work that I will not take. Time for Atari is too sparse these days :(
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: gcc calling convention

Postby dml » Mon Nov 04, 2013 9:54 am

simonsunnyboy wrote:Thanks for your comments, my code base consists of assembly language subroutines mostly. It is not the C part that needs optimizing, and I'm particularly aware of static and things, Embedded C is my profession for a living....rewriting stuffs in C and hoping gcc will optimize the code good enough on 000 and 030 targets is a burden of work that I will not take. Time for Atari is too sparse these days :(


If your code is mostly asm, and it's just being called from C - there's no real C to optimise... then it should be simple to estimate the impact of switching compilers.

1) Since there's nothing to optimise, there likely won't be any gain from switching
2) Unless the 'grain' of those asm functions is not very fine (lots of small calls) the loss of cycles from calls to non-inline asm funcs won't be noticable either. If the grain is too fine - and you're concerned about performance - that's probably something you'd want to fix for any compiler not just gcc.

So it seems that you probably won't have much to gain or lose, performance wise.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: gcc calling convention

Postby simonsunnyboy » Mon Nov 04, 2013 4:45 pm

The routines I have are multipurpose and intended to be used from m68k, GFABASIC and C alike, with only doing minimal changes.
I am aware that gcc optimizes well in comparison to other Atari compilers but that doesn't help me if my routines need a major rewrite and testing if done for gcc.

Any volunteers to port to gcc? http://paradize.atari.org/ -> AHCC libraries :lol:
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

mikro
Hardware Guru
Hardware Guru
Posts: 2034
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: gcc calling convention

Postby mikro » Fri Nov 08, 2013 1:48 am

Why not to use vbcc then? You can specify the calling convention by hand using attributes.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: gcc calling convention

Postby simonsunnyboy » Fri Nov 08, 2013 4:10 pm

I have considered VBCC but does it run on a 68000 setup like AHCC? Another issue keeping me from switching to VBCC is probably another calling convention and getting into gear.

I'll stick with AHCC for the moment as I seldom do real big code....I toyed with compiled sprites for Falcon truecolor at the beginning of the week and results with AHCC were acceptable and not slower than GFABASIC (from which I'm doing the switch since 2009/10)
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Ato
Captain Atari
Captain Atari
Posts: 300
Joined: Tue Aug 10, 2010 3:27 am
Location: Duisburg, Germany

Re: gcc calling convention

Postby Ato » Fri Nov 08, 2013 11:31 pm

simonsunnyboy wrote:I toyed with compiled sprites for Falcon truecolor at the beginning of the week and results with AHCC were acceptable and not slower than GFABASIC (from which I'm doing the switch since 2009/10)


Whoa, wait a sec! How is it possible that compiled code is as slow as interpreted code? -> You did not mention that you compiled the GfA Basic code with its compiler.

Cheers,
T.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: gcc calling convention

Postby simonsunnyboy » Sat Nov 09, 2013 9:26 am

If the C compiler or some inline assembly is badly written, it can easily be slower than GFABASIC as its compiler is not that bad.
Esp code for Falcon Truecolor video (like mentioned) needs careful optimization, e.q. m68k code, or it is too slow for anything else than a slideshow. Copying a fullscreen picture with AHCC memcpy took a dozen of VBLs (one could watch) while my own copy routine (probably not totally optimized) does it in less than 2, possibly 1.

"C is faster than GFA" always depends on the code style and the quality of the compiler.

*EDIT* Talking about GFA, I always have the compiled result in minds as that is what the end user will get. My routines (as they are normally m68k addons) work interpreted equally well.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Henk Robbers
AHCC Developer
AHCC Developer
Posts: 39
Joined: Mon Nov 14, 2011 2:37 pm

Re: gcc calling convention

Postby Henk Robbers » Sun Nov 10, 2013 10:48 pm

I am afraid that AHCC's memcpy is a very opportunistically written C routine (just a simple char loop).
Written with the aim to be correct with little effort.
This has nothing to do with inherent qualities of compiled or interpreted code.

Feel free to produce better performing versions.

Ato
Captain Atari
Captain Atari
Posts: 300
Joined: Tue Aug 10, 2010 3:27 am
Location: Duisburg, Germany

Re: gcc calling convention

Postby Ato » Sun Nov 10, 2013 11:39 pm

Henk Robbers wrote:Feel free to produce better performing versions.


:) I sense there's plenty room for discussion. And my feeling is that it'll be automagically leading to the software developer's dilemma: shall I check for the amount of bytes to be cleared/transferred/copied in order to branch to the optimised method and thus pay the price for the comparison and branch or do I just use the generic version and have non-optimised, i.e. perfect, code executed for all cases. My advice: never optimise for speed before you got hard evidence that the generic case performs not good enough.

OK, to cut to the chase, here's the bottom line, i.e. executive summary (which of course should have come at the top of the posting): in my opinion there's no reason to worry because the char-by-char version will in most cases perform A-OK unless proven otherwise.

Cheers,
T.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5129
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: gcc calling convention

Postby simonsunnyboy » Mon Nov 11, 2013 4:22 pm

The C function memcpy is generic and thus slow. My special use case is a fast screen (background) update so obviously some optimized routine for the special data geometry is needed.

I doubt a generic memcpy can be optimized much, it is meant to be flexible. Main trick I learned is to copy as much data via registers, e.q. doing a lot of movem, maybe a memcpy() could do this for large blocks, e.q. copy multiples of 8 longs with movem first and copy the last longs directly....

But it adds logic and thus more overhead.

This was not meant as a AHCC criticism,it is generic fault of probably almost all generic memcpy() implementations.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee


Social Media

     

Return to “C / PASCAL etc.”

Who is online

Users browsing this forum: No registered users and 3 guests