Falcon Doom

All about games on the Falcon, TT & clones

Moderators: Mug UK, moondog/.tSCc., [ProToS], lp, Moderator Team

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Wed Jan 16, 2013 11:28 pm

dml wrote:There is no code/rendering profiler in BM. This would have been a good move - seeing the %age impact of various kinds of drawing and relative use, for guiding sensible optimisation effort. A prime-number based TimerC event sampler coupled with a task index updated in the main code would be enough to build a decent picture over a few seconds and quite easy to implement.


No need to implement that in BM, Hatari debugger includes a profiler, and it supports both CPU and DSP side as well:
http://hg.tuxfamily.org/mercurialroot/h ... e_debugger

If you load DSP symbol address information to the debugger, it can also show how many times certain symbols get called, or a trace of how the symbols get called.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Thu Jan 17, 2013 9:26 am

Eero Tamminen wrote:No need to implement that in BM, Hatari debugger includes a profiler, and it supports both CPU and DSP side as well:
http://hg.tuxfamily.org/mercurialroot/h ... e_debugger

If you load DSP symbol address information to the debugger, it can also show how many times certain symbols get called, or a trace of how the symbols get called.


Thanks for that. It will be very useful - especially the DSP profiler which is something I didn't have access to before. I think the original DSP debugger for the Falcon showed cycle times (or stalls) next to DSP instructions, taking external memory sources into account. That was pretty useful because you could tell quickly if relocating code or buffers into local/low memory would yield an instant win without inspecting the addresses. A small thing, but really quite valuable.

The CPU profiler is probably really useful for getting a good view of whole program performance. I'd probably use a mix of profiling techniques for BadMood - finding concurrency bottlenecks can involve a bit more than symbols and an address sampling profiler can be quite useful there. Optimizing fragments is also sometimes best done with a separate profiling harness.


I noticed some strange timing differences between Hatari and a real 16MHz Falcon when I was running some tests late last year and I am wondering if Hatari's CPU tries to emulate the 68030 cache and Falcon bus behaviour on a per-instruction basis, or is it approximated globally in some way? e.g. standard times are entered and adjusted for all instructions + external bus, and the cache hit/miss is 'rolled in' globally to all figures so the average performance is the same as a real machine, even if individual instructions are not? I don't really know how it works but I started thinking that exact cache performance emulation would be complex to do, and isn't very important for emulation of a machine with a cache, since the presence of a cache generally rules out timing-sensitive software anyway. Perhaps you (or LaurentS?) can shed some light on this side of Hatari as I am really just guessing how it works.


Anyway for the majority of cases - especially for concurrency profiling - any tiny variations are unlikely to be visible so I might be able to do everything in Hatari once I get a feel for it - which would be a real bonus since I''ll be able to do stuff while travelling :-)

(Thanks to the Hatari team for the convenience of Falcon emulation!)

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 250
Joined: Mon Jan 05, 2009 5:41 pm

Re: Falcon Doom

Postby LaurentS » Thu Jan 17, 2013 11:14 am

Hi,

The general timings of Hatari in Falcon mode are quite good and quite aproximatives at the same time :).

For general emulation, the result is useable (many programs,; music, demos, ...) work correctly.
But in some conditions, the timings are wrong, resulting in music distortion or glithes, freezing of the program (falcamp, ...)

The cpu code manages cache/no cache mechanisms, but the timings are quite wrong.

I started to implement a static table with the 68030 falcon timings, I use it with the new CPU, but it's far from being accurate. (and the static way is probably not the best approch for an emulator).
The problem is that I reach my limits here, but hatari-falcon would benefit a lot of a better cycles timings emulation.

The DSP timings should be OK (in term of cycles) taking into account the inner or external dsp ram access timings.
I've reworked it last year.

But the DSP cycles are given by the cpu cycles (which in hatari is the main clock of the whole emulator).
If the cpu cycles counter is wrong, the dsp will execute a wrong number of cycles and you can have unsynchronizations that don't exist on the real hardware (like sound distortion or programs freezing because they don't transfer datas in synchronisation with the CPU/DSP)

The help of an expert of 68030 cycles and Falcon architecture would be great for Hatari.
But it's still a huge task to do.

Regards

Laurent

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Thu Jan 17, 2013 3:20 pm

Thanks Laurent, that makes complete sense.

I can imagine some of the issues involved with making cycle timings exact for any machine with a cache and buffered writes - anyway I was pretty amazed to see the *average* performance looks pretty much the same as a real Falcon so the approach appears to work. :-)

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Thu Jan 17, 2013 10:49 pm

dml wrote:Thanks for that. It will be very useful - especially the DSP profiler which is something I didn't have access to before. I think the original DSP debugger for the Falcon showed cycle times (or stalls) next to DSP instructions, taking external memory sources into account. That was pretty useful because you could tell quickly if relocating code or buffers into local/low memory would yield an instant win without inspecting the addresses. A small thing, but really quite valuable.

The CPU profiler is probably really useful for getting a good view of whole program performance. I'd probably use a mix of profiling techniques for BadMood - finding concurrency bottlenecks can involve a bit more than symbols and an address sampling profiler can be quite useful there. Optimizing fragments is also sometimes best done with a separate profiling harness.


Some notes:

The profiler is really simple, it has just memory sized array of structs where it increments counters after each instruction, so in principle the profiling information should be as accurate as Hatari CPU core instruction cycle count information is. But you should keep in mind that information is per address, so if you change the code in memory (e.g. upload different code to DSP RAM), you should profile that separately.

You can automate things. You can tell debugger to parse commands from a file, those commands can load symbols at given addresses, set up breakpoints and setup profiling or get profiling data. You can also chain things so that additional things happen as response to hitting specific breakpoints, collecting profiling output as the program proceeds from one breakpoint to another. See the manual for more information on this.

Profiler doesn't process profiling data to aggregate time spent within functions like more advanced profilers do, so that one can see cumulative time spent withing given function (symbol) and everything it calls before returning. For that you will (for now) need something within the program itself that just runs the function you're interested about in loop and times how long takes & calculates average for single run.

I've used the profiler mainly to find out where apparently frozen programs, or TOS, have gotten stuck. Therefore any feedback on the accuracy or the usefulness of the provided data will be greatly appreciated (preferably to hatari-devel@lists.tuxfamily.org mailing list, not this forum as I read this only infrequently).

I've made some significant usability fixes [1] to Hatari debugger after the last v1.6.2 Hatari release, so I would recommend building latest Hatari from Mercurial [2].

Hatari has 2 CPU cores, WinUAE and "old" UAE one. To switch between them, you need re-run (c)cmake and re-build Hatari. WinUAE one has more (mostly) more cycle-accurate 030+DSP emulation, but lacks support for some minor emulator features [3]. "old" UAE core works fine for all Atari models and is mostly fine unless you need more cycle-accurate 030 emulation. In the binary releases "old" UAE core build is called "hatari(.exe)" and WinUAE core build is called "hatari_falcon(.exe)".

[1] see: http://hg.tuxfamily.org/mercurialroot/h ... -notes.txt
[2] http://hatari.tuxfamily.org/download.html (or I can send a source tarball if I get your email address, it's 1.6MB)
[3] http://hg.tuxfamily.org/mercurialroot/h ... c/todo.txt

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Fri Jan 18, 2013 1:20 pm

Again, thanks.

I have played a little now with the Hatari debugger and profiler so I get the general idea. I had trouble getting HiSoft symbols imported (had to hand-reformat the text file) but it worked once I had done that.


So I did get some profiling information out of BadMood using both Hatari and a builtin context-sampling profiler. Both tell similar stories, although quite different views. There is no single bottleneck as such in the code - the cost is spread out over many small things, but there is obvious bias by context - a significant portion of time is spent in both BSP (scene) and wall/floor related code which is closely related to rendering. Particularly getting stuff in and out of the DSP, although there is very little stalling on the CPU side of that.

I can probably use the Hatari profiler to find out what percentage load the DSP is carrying as well, since idle time will show as excess activity in the command processing loop. If the DSP is idle a lot of the time it could be used more with some reorganization. I see there is still quite a bit of CPU-side math for walls which doesn't seem like it has to be done by the CPU.


So using a test build with sprites disabled, the e1m1 startpoint yields around 10.7fps with the rendering still on but pixel plotting bypassed (no texture sources or display writes), 8.1fps using flat fills (display writes only) and 4.0fps with full texturing (texture sources and writes).

The fact that the framerate is still clearly capped with the pixel drawing hobbled suggests BadMood can be speeded up - since the pure display-writing bottleneck is gone and that kind of thing hits a hard limit quickly. However there are no easy targets I can see - there is no single thing can be 'fixed' to make it run much faster. It needs work in multiple areas including textured rendering, texture cache, shards/spanbuffer, BSP walk, offloading more to DSP etc. Will just have to take bits in turn, probably starting again with the BSP walk and visibility tests and drilling down from there. But in some ways it's not so much a 'what to change?' as a 'what to leave alone?' :-P

I guess I knew all this at the time and just got tired picking at areas when nothing specific stood out.

Note: The 'profiling' entry in the pics is the time taken to perform the profiling %age calcs and text display - not the profiler's own overhead on the code, which is invisible.

bmprf1.png

bmprf2.png

bmprf3.png
You do not have the required permissions to view the files attached to this post.

User avatar
nativ
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4087
Joined: Mon Jul 30, 2007 10:26 am
Location: South West, UK

Re: Falcon Doom

Postby nativ » Fri Jan 18, 2013 2:02 pm

Lovin' the coder colours version 8) :lol:
Atari STFM 512 / STe 4MB / Mega ST+DSP / Falcon 4MB 16Mhz 68882 - DVD/CDRW/ZIP/DAT - FDI / Jaguar / Lynx 1&2 / 7800 / 2600 / XE 130+SD Card // Sega Dreamcast / Mega2+CD2 // Apple G4

http://soundcloud.com/nativ ~ http://soundcloud.com/nativ-1 ~ http://soundcloud.com/knot_music
http://soundcloud.com/push-sounds ~ http://soundcloud.com/push-records

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Fri Jan 18, 2013 7:18 pm

I tried a quick test on BadMood using 256 colour + c2p (chunky pixels -> atari bitplanes) instead of 16bit truecolour, before I go too far down the optimization route.

It's clearly not a win :-P

25+% of total time is now c2p, and the main rendering code is only a little faster with 8bit pixels. It could be speeded further for 8bit but it's not going to offset that extra cost whatever happens.

For the time being 16bit chunky is still the way to go on the 030 for BadMood.

User avatar
calimero
Atari God
Atari God
Posts: 1939
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Falcon Doom

Postby calimero » Fri Jan 18, 2013 8:24 pm

dml wrote:I tried a quick test on BadMood using 256 colour + c2p (chunky pixels -> atari bitplanes) instead of 16bit truecolour, before I go too far down the optimization route.

It's clearly not a win :-P

25+% of total time is now c2p, and the main rendering code is only a little faster with 8bit pixels. It could be speeded further for 8bit but it's not going to offset that extra cost whatever happens.

For the time being 16bit chunky is still the way to go on the 030 for BadMood.

hm... evil or kalms / dhs or maybe mikro or amiga coders from tbl should have much experience at c2p.

if I am not mistaking, 25% of CPU is far to much for c2p only! I think that there is much faster technik today (mainly from amiga world :))

but do not take my writings to seriously; best talk to ppl at http://dhs.nu/bbs-scene/ first :)
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Fri Jan 18, 2013 9:17 pm

calimero wrote:hm... evil or kalms / dhs or maybe mikro or amiga coders from tbl should have much experience at c2p.

if I am not mistaking, 25% of CPU is far to much for c2p only! I think that there is much faster technik today (mainly from amiga world :))

but do not take my writings to seriously; best talk to ppl at http://dhs.nu/bbs-scene/ first :)


I'll probably drop a msg there about it. I haven't looked at c2p since the late 90's. Although the one I had used in Quake looks quite similar in layout to the one I got off the DHS site tonight (I'm currently using the DHS one).

However I don't know if that's a favoured routine they posted or just a public example. And I should really be testing on a real Falcon to be sure as well...

Anyway I'll retest with anything interesting that turns up. It had just better be fast, because TC rendering doesn't seem very much slower than 8bit mode!

[edit]

...dropped a query at DHS on the c2p problem. BTW it was the 'Kalms' code I was experimenting with :) although there is a small chance I could have screwed something up in my brief test...
Last edited by dml on Sat Jan 19, 2013 11:50 am, edited 1 time in total.

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Fri Jan 18, 2013 9:36 pm

dml wrote:I have played a little now with the Hatari debugger and profiler so I get the general idea. I had trouble getting HiSoft symbols imported (had to hand-reformat the text file) but it worked once I had done that.


If you show what the HiSoft symbols text format looks like, or better, send an example (to address oak at helsinkinet fi), I can write a simple awk or python script that converts it to Hatari format and include it with Hatari.

dml wrote:I can probably use the Hatari profiler to find out what percentage load the DSP is carrying as well, since idle time will show as excess activity in the command processing loop. If the DSP is idle a lot of the time it could be used more with some reorganization. I see there is still quite a bit of CPU-side math for walls which doesn't seem like it has to be done by the CPU.


If there's something where the debugger / profiler could help, just ask and I'll look how feasible it is. Emulator in theory can access all information in the emulated system, so a lot is possible, if not always practical. :-)

dml wrote:I guess I knew all this at the time and just got tired picking at areas when nothing specific stood out.


I know that feeling... (from looking into Qt GUI toolkit memory usage)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sat Jan 19, 2013 11:58 am

Eero Tamminen wrote:If you show what the HiSoft symbols text format looks like, or better, send an example (to address oak at helsinkinet fi), I can write a simple awk or python script that converts it to Hatari format and include it with Hatari.


Ok I'll forward a sample, although you may not thank me :) for some reason it's formatted like something you'd want to print vs parse into another program... I should look at Devpac manual to see if a more machine-friendly format is available.

Eero Tamminen wrote:If there's something where the debugger / profiler could help, just ask and I'll look how feasible it is. Emulator in theory can access all information in the emulated system, so a lot is possible, if not always practical. :-)


It would be great to get a complete disassembly/dump of the program with the incidence 'counts' and cycle times immediately beside the code, either left or right side - doesn't matter. This would make for excellent analysis and some nice Python tools could be made to work on that information.

Equally useful for CPU and DSP.

(Perhaps it already does this or can be done with commands/script? I'm not quite done with the manual!)

Eero Tamminen wrote:I know that feeling... (from looking into Qt GUI toolkit memory usage)


:)

So far Hatari looks like a great dev tool as well as an emulator. I can already do a lot of cross-dev work without shuffling things between PC/Atari on floppies/flash cards etc..

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Falcon Doom

Postby kristjanga » Sat Jan 19, 2013 4:39 pm

my favorite threat at the moment :cheers:

User avatar
lotek_style
Mod(ul)erator
Mod(ul)erator
Posts: 2319
Joined: Sat May 11, 2002 2:39 pm
Location: germany
Contact:

Re: Falcon Doom

Postby lotek_style » Sat Jan 19, 2013 7:39 pm

I have to apologize that I havent read the whole threat... but Douglas... maybe this helps (not sure as its for ST):

http://www.tscc.de/oldpage/c2ptut.html
lotek style / the sirius cybernetics corporation
- musician - ascii-artist - swapper - archivist -

.tSCc. - low-tech atari cyberpunks since 1990
http://www.tscc.de/ | http://demozoo.org/ | http://www.lotekstyle.de/ | http://ymrockerz.atari.org/

User avatar
Cyprian
Atari God
Atari God
Posts: 1331
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Falcon Doom

Postby Cyprian » Sat Jan 19, 2013 8:07 pm

lotek, thats c2p is not optimized for 030:
As far as I remember, the best one for 030 and Falcon is prepared by Kalms and optimized by Mikro:
"Chunky to planar routines by Kalms / TBL and Mikro / Mystic Bytes "
http://dhs.nu/files.php?t=democreation
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Aranym / Steem / Saint
http://260ste.appspot.com/

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 597
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Falcon Doom

Postby Anima » Sat Jan 19, 2013 8:14 pm

dml wrote:I can already do a lot of cross-dev work without shuffling things between PC/Atari on floppies/flash cards etc..


I am using ZOC Terminal and CoNnect in combination with an USB adapter and a null modem cable to transfer the data. The transfer speed of 11 kb/s is quite ok.

Cheers
Sascha

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sat Jan 19, 2013 8:27 pm

Anima wrote:I am using ZOC Terminal and CoNnect in combination with an USB adapter and a null modem cable to transfer the data. The transfer speed of 11 kb/s is quite ok.

Cheers
Sascha


Nice solution. I will definitely look at that. I prefer a cable-based route to swapping media any time.

Cheers
Doug

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sat Jan 19, 2013 8:31 pm

Cyprian wrote:lotek, thats c2p is not optimized for 030:
As far as I remember, the best one for 030 and Falcon is prepared by Kalms and optimized by Mikro:
"Chunky to planar routines by Kalms / TBL and Mikro / Mystic Bytes "
http://dhs.nu/files.php?t=democreation


...which is good news and bad, as that's the first one I tried. However I'll return to it and check a bunch of things properly before I put it down. I considered several things afterwards that I should have looked into at the time I tested (such as: did it fit in the cache and/or require cache alignment in order to fit, and why did it not assemble out-of-the-box without needing to fix one of the short branches?)

I managed to get some small optimisations done today on BM but was hampered for much of the time by weird DSP tools issues. Mostly got that sorted now so it's all good.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 11:11 am

Small read/write interleaving optimizations around the CPU<->DSP handshaking for floor spans + some extra pixel unrolling + inserting a lighting cache between textures and pixel drawing is showing a 17% speed increase over BM 3.07 using the default 320x168 / truecolour display. (Hatari testing only - with any luck gain will be better on real 030 which I know for sure can hide some instructions behind memory ops - but there's a still chance it may go the other direction and be a bit worse)

The lighting cache is not written yet but my speed test shows it's probably worth doing. I'm still just getting a feel for the various costs and where the two processors overlap (or not) so hopefully there are plenty more optimizations to be found.

c2p/256c isn't looking promising so far. The best methods are just too slow and it seems the highly optimized versions are targeted at plain 68000 or 68060 (and are quite different). There is no magic c2p for 68030 and there may never be one fast enough for this. For now that route is a dead-end.

User avatar
Omikronman
Atari Super Hero
Atari Super Hero
Posts: 514
Joined: Wed Dec 01, 2004 12:13 am
Location: Germany
Contact:

Re: Falcon Doom

Postby Omikronman » Sun Jan 20, 2013 12:01 pm

I did play around with Bad Mood 3.07 some days ago. It is still quite impressive how quick it is. :-)

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Sun Jan 20, 2013 3:09 pm

dml wrote:It would be great to get a complete disassembly/dump of the program with the incidence 'counts' and cycle times immediately beside the code, either left or right side - doesn't matter. This would make for excellent analysis and some nice Python tools could be made to work on that information.


For that you need the latest version of Hatari from the Mercurial repository, then:

Stop at program start to enable profiling:

Code: Select all

b pc = text
c


Enable CPU and DSP profiling:

Code: Select all

profile on
dspprofile on
c


After program runs a while (or you've set a breakpoint at suitable place), enter debugger and ask for data:

Code: Select all

profile addresses
dspprofile addresses


Logging the output is currently a bit of pain though as you cannot change logging file in the middle (e.g. to get CPU and DSP profiles to separate files).

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Sun Jan 20, 2013 3:12 pm

Output looks like this:

Code: Select all

> profile addresses
$01f77a :             pea       $1fe18(pc)                 0.00% (1, 12)
$01f77e :             move.w    #$26,-(sp)                 0.00% (1, 16)
$01f782 :             trap      #$e                        0.00% (1, 12)
[...]
$01fe18 :             move.w    $ffff8266.w,d0             0.00% (1, 8)
$01fe1c :             btst      #4,d0                      0.00% (1, 12)
$01fe20 :             beq       $1fe2c                     0.00% (1, 10)
[...]
$01fe2c :             moveq     #$f,d4                     0.00% (1, 12)
$01fe2e :             moveq     #$3f,d3                    0.00% (1, 4)
$01fe30 :             move.w    #$25,-(sp)                 0.00% (64, 760)
$01fe34 :             trap      #$e                        0.00% (64, 768)
$01fe36 :             addq.l    #2,sp                      0.00% (64, 1280)

Values in parenthesis are counts and cycles.

User avatar
calimero
Atari God
Atari God
Posts: 1939
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Falcon Doom

Postby calimero » Sun Jan 20, 2013 3:52 pm

dml wrote:Small read/write interleaving optimizations around the CPU<->DSP handshaking for floor spans + some extra pixel unrolling + inserting a lighting cache between textures and pixel drawing is showing a 17% speed increase over BM 3.07 using the default 320x168 / truecolour display.

Wow! 17% is quite good! and pixel drawing is most time consuming part...

dml wrote:c2p/256c isn't looking promising so far. The best methods are just too slow and it seems the highly optimized versions are targeted at plain 68000 or 68060 (and are quite different). There is no magic c2p for 68030 and there may never be one fast enough for this. For now that route is a dead-end.

25% of CPU time for C2P... :( I was convinced that it is less but I was wrong!

so basically Amiga AGA demos had penalty of 25% on CPU while doing 3D texture mapped graphics. So Falcon should be better for 3D/texturing but I also see great demoes on AGA (but it is hard to compare since low-end spec. for Amiga 1200 3D demos is usually 030/50Mhz... 2.4x faster than Falcon). offtopic: what is most impresive 3D AGA demo for stock Amiga1200 (with fast ram)?
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 5:09 pm

Eero Tamminen wrote:Output looks like this:


Brilliant! I'll be using this next week for sure.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3427
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 5:14 pm

calimero wrote:Wow! 17% is quite good! and pixel drawing is most time consuming part...


...yes although there are some other heavy costs associated with floor/ceiling - and not specifically pixel drawing - which I'm trying to do something with.

calimero wrote:25% of CPU time for C2P... :( I was convinced that it is less but I was wrong!
so basically Amiga AGA demos had penalty of 25% on CPU while doing 3D texture mapped graphics. So Falcon should be better for 3D/texturing but I also see great demoes on AGA (but it is hard to compare since low-end spec. for Amiga 1200 3D demos is usually 030/50Mhz... 2.4x faster than Falcon). offtopic: what is most impresive 3D AGA demo for stock Amiga1200 (with fast ram)?


It's not quite that straightforward - c2p benefits from fastram and Amiga blitter can do half of the work while the CPU does the rest... and so on. And it depends on the image size and the number of colours/planes being converted.

For 030 alone, with just slow 16bit STRam and converting all 8 bitplanes for most of the screen - not ideal. A faster Falcon-specific version could be brewed but I can't imagine one fast enough that it would beat TC rendering as it is.


Social Media

     

Return to “Games”

Who is online

Users browsing this forum: No registered users and 2 guests