Have the Atari Falcon? Please run it for me!

Troubles with your machine? Just want to speak about the latest improvements? This is the place!

Moderators: Mug UK, Zorro 2, spiny, Greenious, Moderator Team

Post Reply
Vol
Atarian
Atarian
Posts: 3
Joined: Wed May 12, 2021 10:00 am

Have the Atari Falcon? Please run it for me!

Post by Vol »

I tend a project. I have still missed data from the Atari Falcon. :( Would you like please to help me with this? Just run PI-ST and PI-ST30 for 100, 1000, and 3000 digits. The programs print digits of the number π and timing in seconds. I need those timing values. Please use the main CPU of the Atari Falcon (the 68030 at 16 MHz). Thank you very much in advance.
You do not have the required permissions to view the files attached to this post.
User avatar
Arne
Atari Super Hero
Atari Super Hero
Posts: 746
Joined: Thu Nov 01, 2007 10:01 am

Re: Have the Atari Falcon? Please run it for me!

Post by Arne »

You know about the system architecture of the Falcon? CPU access to ST-RAM is influenced by the screen resolution/colour depth. I get different results depending on these.
Image
czietz
Hardware Guru
Hardware Guru
Posts: 1501
Joined: Tue May 24, 2016 6:47 pm

Re: Have the Atari Falcon? Please run it for me!

Post by czietz »

Arne wrote: Wed May 12, 2021 12:46 pm You know about the system architecture of the Falcon? CPU access to ST-RAM is influenced by the screen resolution/colour depth. I get different results depending on these.
BTW, the CoreMark results show this effect quite well. E.g., compare "Falcon @ 16 MHz in ST high" vs. "Falcon @ 16 MHz in Truecolor" in this table: https://github.com/czietz/coremark/wiki/Results
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 2220
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Have the Atari Falcon? Please run it for me!

Post by Cyprian »

some figures for the Falcon you can find there: https://www.atari-forum.com/viewtopic.php?f=27&t=23543

Code: Select all

640x400x2C RGB 68030-16MHz 68882-32MHz 
Linear 32bit read (ST-Ram)   -> 5.475 MByte/sec (~103%)
Linear 32bit write (ST-Ram)  -> 6.660 MByte/sec (~103%)
Linear 32bit copy (ST-Ram)   -> 3.336 MByte/sec (~103%)

640x400xTC RGB 68030-16MHz 68882-32MHz
Linear 32bit read (ST-Ram)   -> 4.113 MByte/sec (~77%)
Linear 32bit write (ST-Ram)  -> 4.983 MByte/sec (~77%)
Linear 32bit copy (ST-Ram)   -> 2.501 MByte/sec (~77%)
Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
czietz
Hardware Guru
Hardware Guru
Posts: 1501
Joined: Tue May 24, 2016 6:47 pm

Re: Have the Atari Falcon? Please run it for me!

Post by czietz »

I just ran your program on Hatari (and not on the real Falcon), and, therefore, cannot give you reliable numbers, yet. But I wonder: You only time the calculation of the digits of pi, don't you? If you did actually include printing the 100 - 3000 digits to the screen in your benchmark, the results would be influenced/biased by the speed of the text output routine. I.e., it would make a difference whether you ran the benchmark when NVDI is loaded. Therefore, whoever provides you timing values has to specify their detailed system configuration (OS version, screen resolution, whether any 3rd party VDI is loaded, ...); otherwise the results have limited meaning.
User avatar
Arne
Atari Super Hero
Atari Super Hero
Posts: 746
Joined: Thu Nov 01, 2007 10:01 am

Re: Have the Atari Falcon? Please run it for me!

Post by Arne »

czietz wrote: Wed May 12, 2021 4:45 pmIf you did actually include printing the 100 - 3000 digits to the screen in your benchmark, the results would be influenced/biased by the speed of the text output routine.
I noticed that, too (on a stock F030). Especially when it comes to scrolling with 1000/3000 iterations. He included VASM source but I guess he used either Bconout()/Cconout(). And with NVDI (or similar) things will become weird!
If printing during calculation is switched off the bandwith for screen-memory reads is lost and will result in varying results.
Last edited by Arne on Thu May 13, 2021 7:49 am, edited 1 time in total.
Image
Vol
Atarian
Atarian
Posts: 3
Joined: Wed May 12, 2021 10:00 am

Re: Have the Atari Falcon? Please run it for me!

Post by Vol »

Arne wrote: Wed May 12, 2021 12:46 pm You know about the system architecture of the Falcon? CPU access to ST-RAM is influenced by the screen resolution/colour depth. I get different results depending on these.
I only know that the Falcon has an unusual architecture. So it is very interesting to get results from this machine. I suggest to tune the system in a way it can show the maximum performance. If code for this tuning was provided I would include it in programs.
czietz wrote: Wed May 12, 2021 4:45 pm I just ran your program on Hatari (and not on the real Falcon), and, therefore, cannot give you reliable numbers, yet. But I wonder: You only time the calculation of the digits of pi, don't you? If you did actually include printing the 100 - 3000 digits to the screen in your benchmark, the results would be influenced/biased by the speed of the text output routine. I.e., it would make a difference whether you ran the benchmark when NVDI is loaded. Therefore, whoever provides you timing values has to specify their detailed system configuration (OS version, screen resolution, whether any 3rd party VDI is loaded, ...); otherwise the results have limited meaning.
Thank you. You know Hatari is very inaccurate for the 68020+. :( So I need more reliable data. Of course, digit printing affects the timings but this the minor factor if you print 3000 digits. Moreover the table give us a choice to get separate data on CPU and IO timings.
Arne wrote: Wed May 12, 2021 5:43 pm I noticed that, too (on a stock F030). Especially when it comes to scrolling with 1000/3000 iterations. He included VASM source but I guess he used either Bconout()/Cconout(). And with NVDI (or similar) things will become weird!
If printing during calculation is switched off the bandwith for screen-memory reads is lost and will result in varying results. Even running the program from AltRAM will not help.
Sorry I don't understand. What is NVDI? Would you also like please to clarify your phrase "If printing during calculation is switched off the bandwith for screen-memory reads is lost and will result in varying results"?
AnthonyJ
Atari maniac
Atari maniac
Posts: 75
Joined: Sat Jan 26, 2013 8:16 am

Re: Have the Atari Falcon? Please run it for me!

Post by AnthonyJ »

Vol wrote: Thu May 13, 2021 6:11 am I suggest to tune the system in a way it can show the maximum performance. If code for this tuning was provided I would include it in programs.
If you're goal is to compare performance of the Falcon against other systems such as the Amiga 1200, I would guess making use of the DSP would constitute "tuning it to show the maximum performance", especially since this is the most unusual part of the Falcon's architecture (an asymetric multi-processor system). That's probably quite some work though.
User avatar
Arne
Atari Super Hero
Atari Super Hero
Posts: 746
Joined: Thu Nov 01, 2007 10:01 am

Re: Have the Atari Falcon? Please run it for me!

Post by Arne »

Vol wrote: Thu May 13, 2021 6:11 am I only know that the Falcon has an unusual architecture. So it is very interesting to get results from this machine. I suggest to tune the system in a way it can show the maximum performance. If code for this tuning was provided I would include it in programs.
If the character printing is included in the overall pi calculation then the results between machines are not comparable at all. A C64 has a text mode and printing is not as time consuming (in comparison) than in graphics mode. But 68K Ataris do not have a text mode. They always run in a graphic mode. Here printing takes more effort for TOS to print a single char i.e. doing printing/scrolling is a pretty bad idea for a pure calculation benchmark :!:
Vol wrote: Thu May 13, 2021 6:11 am Sorry I don't understand. What is NVDI?
https://lmgtfy.app/#gsc.tab=0&gsc.q=nvdi%20atari
Vol wrote: Thu May 13, 2021 6:11 am Would you also like please to clarify your phrase "If printing during calculation is switched off the bandwith for screen-memory reads is lost and will result in varying results"?
Assume a screen resolution of 640x400x2colours. Thats exactly 32000 bytes of screen memory that has to be sent to the CRT/TFT for (let's say) 60 times a second. Now if you switch to 640x480x256colours this results in 307200 bytes to be sent 60 times a second to the screen - almost 10 times as much.
As there is only a limited bandwith RAM can be accessed and you cannot "throttle" the screen memory access this means that the above calculated screen read-out bandwith is subtracted from the total available bandwith i.e. CPU has less often access to RAM. And where is your program/data located?
AFAIK Videl (the video controller) cannot be switched off like it's done in the C128 in 2MHz mode. But if I am wrong some Falcon expert could clarify here.
Image
simonsunnyboy
Moderator
Moderator
Posts: 5349
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: Have the Atari Falcon? Please run it for me!

Post by simonsunnyboy »

In short, do not access any I/O including screen output while performing any benchmark if the goal is comparison.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee
czietz
Hardware Guru
Hardware Guru
Posts: 1501
Joined: Tue May 24, 2016 6:47 pm

Re: Have the Atari Falcon? Please run it for me!

Post by czietz »

simonsunnyboy wrote: Thu May 13, 2021 7:30 am In short, do not access any I/O including screen output while performing any benchmark if the goal is comparison.
This is why benchmarks such as CoreMark (Atari port here) by design do not call any library or OS functions within the timed benchmark code.

While I still question the comparability of these pi benchmark results, here are the times, as measured on a real Falcon with PI-ST30.TOS. CPU running at 16 MHz. Video mode was "ST high", which has a comparatively low memory bandwidth requirement.

With TOS 4.04:
100 dig. 0.07 s
1000 dig. 3.40 s
3000 dig. 28.11 s

With EmuTOS:
100 dig. 0.07 s
1000 3.36 s
3000 dig. 27.93 s

As noted above, I would expect the numbers to change more drastically if software that accelerates screen output (e.g., NVDI) is installed.
User avatar
Arne
Atari Super Hero
Atari Super Hero
Posts: 746
Joined: Thu Nov 01, 2007 10:01 am

Re: Have the Atari Falcon? Please run it for me!

Post by Arne »

czietz wrote: Thu May 13, 2021 8:06 am This is why benchmarks such as CoreMark (Atari port here) by design do not call any library or OS functions within the timed benchmark code.
That's how it should be done!
czietz wrote: Thu May 13, 2021 8:06 am (...) here are the times, as measured on a real Falcon with PI-ST30.TOS. CPU running at 16 MHz. Video mode was "ST high", which has a comparatively low memory bandwidth requirement.
My results for the above mentioned config are:

With TOS 4.04:
100 dig. 0.07 s
1000 dig. 3.37 s
3000 dig. 28.05 s

With TOS 4.04 + NVDI 5.03
100 dig. 0.06 s
1000 dig. 3.26 s
3000 dig. 27.69 s
czietz wrote: Thu May 13, 2021 8:06 amAs noted above, I would expect the numbers to change more drastically if software that accelerates screen output (e.g., NVDI) is installed.
See above. I am surprised but I don't know if any relevant (for this benchmark) part of NVDI can be tuned. But my subjective perception always has been that screen output was faster on earlier versions (like 2.x). But it's been decades since I used Ataris on a daily basis.
Image
Vol
Atarian
Atarian
Posts: 3
Joined: Wed May 12, 2021 10:00 am

Re: Have the Atari Falcon? Please run it for me!

Post by Vol »

AnthonyJ wrote: Thu May 13, 2021 7:14 amIf you're goal is to compare performance of the Falcon against other systems such as the Amiga 1200, I would guess making use of the DSP would constitute "tuning it to show the maximum performance", especially since this is the most unusual part of the Falcon's architecture (an asymetric multi-processor system). That's probably quite some work though.
I am rather gathering data. To compare performance is just a way to use this data. Of course, it will be very interesting to implement pi-spigot for the DSP but I doubt that it is possible. The code for the main loop is quite short for the 68k:

Code: Select all

.l2      sub.l d6,d5
         sub.l d7,d5
         lsr.l d5
.l4      move.w -(a3),d0  ; r[i]
         mulu.w d1,d0     ;r[i]*10000
         add.l d0,d5    ;d += d + r[i]*10000
         move.l d5,d6
         divu.w d4,d6
         bvs.s .longdiv ;this branch is taken very rarely

         move.w d6,d7
         clr.w d6
         swap d6
         move.w d6,(a3)   ;r[i] <- d%b
         subq.w #2,d4     ;i <- i - 1
         bcc.s .l2        ;the main loop
If somebody help me to convert this code to the DSP, it will be nice.
Arne wrote: Thu May 13, 2021 7:17 am If the character printing is included in the overall pi calculation then the results between machines are not comparable at all. A C64 has a text mode and printing is not as time consuming (in comparison) than in graphics mode. But 68K Ataris do not have a text mode. They always run in a graphic mode. Here printing takes more effort for TOS to print a single char i.e. doing printing/scrolling is a pretty bad idea for a pure calculation benchmark
It is not a pure benchmark, it is a calculator for the number pi digits. This calculator may be used for some benchmarking. :)
Of course, different systems have different overheads on printing. The largest overhead have systems which use tcp/ip connection like telnet. BTW the Amiga character output is slower than the Atari ST. ;) However I dare to repeat that this overhead is only a little fraction of the total calculation timing while we compute 3000 digits. You can select to show separated data for the CPU and IO timings. I have attached a picture which shows the latest results using Total/CPU format. You can notice than the CPU timings for all four Falcon cases differ less than 0.5%.
falcon-sheet.png
BTW thank you for a hint about NDVI. The presence of so good system software for the Atari systems impress me very much.
Arne wrote: Thu May 13, 2021 7:17 am Assume a screen resolution of 640x400x2colours. Thats exactly 32000 bytes of screen memory that has to be sent to the CRT/TFT for (let's say) 60 times a second. Now if you switch to 640x480x256colours this results in 307200 bytes to be sent 60 times a second to the screen - almost 10 times as much.
As there is only a limited bandwith RAM can be accessed and you cannot "throttle" the screen memory access this means that the above calculated screen read-out bandwith is subtracted from the total available bandwith i.e. CPU has less often access to RAM. And where is your program/data located?
AFAIK Videl (the video controller) cannot be switched off like it's done in the C128 in 2MHz mode. But if I am wrong some Falcon expert could clarify here.
Of course you get different timings for different video modes so I asked to use the fastest video mode possible. However, a video mode doesn't affect ER calculation which is used as a CPU performance meter. So if somebody sends me data from a slower video mode this does not affect results too much. ;) It will be interesting to switch the video off. The the 8-bit Atari allows us to do such a trick.
czietz wrote: Thu May 13, 2021 8:06 am While I still question the comparability of these pi benchmark results, here are the times, as measured on a real Falcon with PI-ST30.TOS. CPU running at 16 MHz. Video mode was "ST high", which has a comparatively low memory bandwidth requirement.

With TOS 4.04:
100 dig. 0.07 s
1000 dig. 3.40 s
3000 dig. 28.11 s

With EmuTOS:
100 dig. 0.07 s
1000 3.36 s
3000 dig. 27.93 s

As noted above, I would expect the numbers to change more drastically if software that accelerates screen output (e.g., NVDI) is installed.
Thank you very much. Would you like please also run PI-ST.TOS? Timings from this program can help highlight the 68030 code advantages over the plain 68000 code.
Arne wrote: Thu May 13, 2021 8:35 am My results for the above mentioned config are:

With TOS 4.04:
100 dig. 0.07 s
1000 dig. 3.37 s
3000 dig. 28.05 s

With TOS 4.04 + NVDI 5.03
100 dig. 0.06 s
1000 dig. 3.26 s
3000 dig. 27.69 s

See above. I am surprised but I don't know if any relevant (for this benchmark) part of NVDI can be tuned. But my subjective perception always has been that screen output was faster on earlier versions (like 2.x). But it's been decades since I used Ataris on a daily basis.
Thank you very much. The table is updated. I also dare to ask you to run PI-ST.TOS - you can notice that data for the Amiga-1200 have two variants which were produced by two programs PI-AMIGA and PI-AMIGA1200.
You do not have the required permissions to view the files attached to this post.
ThorstenOtto
Atari God
Atari God
Posts: 1477
Joined: Sun Aug 03, 2014 5:54 pm

Re: Have the Atari Falcon? Please run it for me!

Post by ThorstenOtto »

Arne wrote: Thu May 13, 2021 8:35 am my subjective perception always has been that screen output was faster on earlier versions (like 2.x).
Somewhat off-topic, but: there is only a minimal overhead of NVDI 5.x compared to 2.51, because the driver was split in two parts. The actual drawing routines are almost identical.
Post Reply

Return to “Hardware”