I might be completely wrong on this, but to me it looks like a memory write: the first parameter is the destination (memory address stored in register ebx) and the source is the second parameter - the value of the register eax. My knowledge of asm is rusted, so please correct me if I'm wrong. It would also make more sense to me that reading takes less cycles than writing.calvinmorrow wrote: Memory Read is a MOV %(ebx), %eax...
ao486
Moderators: Mug UK, Zorro 2, spiny, Greenious, Sorgelig, Moderator Team
Re: ao486 Performance Technical Discussion
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
In the (more common?) Intel Syntax that would be correct, but apparently gcc uses AT&T by default.
From: https://www.ibiblio.org/gferg/ldp/GCC-I ... HOWTO.html
From: https://www.ibiblio.org/gferg/ldp/GCC-I ... HOWTO.html
GCC uses AT&T asm syntax. This is a little bit different from the regular
Intel format. The main differences are:
* AT&T syntax uses the opposite order for source and destination operands,
source followed by destination.
Re: ao486 Performance Technical Discussion
Thanks, was not aware of that.calvinmorrow wrote:.. but apparently gcc uses AT&T by default..
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
I posted the C code and assembly I'm using here: https://gist.github.com/calvinmorrow/90 ... dd9af1ca64
After getting the code back to my regular Linux machine, I recompiled it and ran it for a quick sanity check. There's some minor differences ... no -march=i486 since I'm on 64-bit, increased operations *10, and I had to disable the leal ... wrong register error or something on x86-64.
The main reason for running locally vs ao486 was to double check that my code produced the result I would have expected, that memory reads would be slightly faster than memory writes. On my machine that seems to be the case, which only makes the ao486 result stand out more.
After getting the code back to my regular Linux machine, I recompiled it and ran it for a quick sanity check. There's some minor differences ... no -march=i486 since I'm on 64-bit, increased operations *10, and I had to disable the leal ... wrong register error or something on x86-64.
The main reason for running locally vs ao486 was to double check that my code produced the result I would have expected, that memory reads would be slightly faster than memory writes. On my machine that seems to be the case, which only makes the ao486 result stand out more.
Code: Select all
gcc -o bench -O2 -funroll-loops bench.c
./bench
1000000000 NOP Operations in 88 Milliseconds
1000000000 MEMREAD Operations in 131 Milliseconds
1000000000 MEMWRITE Operations in 259 Milliseconds
1000000000 ADD Operations in 257 Milliseconds
1000000000 SUBTRACT Operations in 252 Milliseconds
Re: ao486 Performance Technical Discussion
Wow.. that's a weird assembler.
What are you using for your timer when you are doing your tests? I have a LOT of experience with the PC and assembly code. I wrote FUSION-PC, which is a Mac emulator for the PC. It was 1.7 million lines of assembly code. I also wrote PCx, which was the first Intel Pentium based PC emulator for the Amiga and that was 1.6 million lines of 68K assembly. I would be happy to help with speed improvements. I found that making things native (like the video and video BIOS) made a huge improvement to the overall speed of my emulation.
What are you using for your timer when you are doing your tests? I have a LOT of experience with the PC and assembly code. I wrote FUSION-PC, which is a Mac emulator for the PC. It was 1.7 million lines of assembly code. I also wrote PCx, which was the first Intel Pentium based PC emulator for the Amiga and that was 1.6 million lines of 68K assembly. I would be happy to help with speed improvements. I found that making things native (like the video and video BIOS) made a huge improvement to the overall speed of my emulation.
I am the flux ninja
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
My replies are a bit slow atm because most of them are getting stuck awaiting moderator approval (new account and links probably). I posted the code, url inbound shortly. Would certainly welcome the help and any improvements!
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
Originally I started down the route of trying to use the RDTSC instruction so I could get an accurate cycle count while performing the operational loop. I actually had a working (on my PC) implementation, only to find out that the instruction wasn't added until the 586.JimDrew wrote:What are you using for your timer when you are doing your tests?
At the moment I'm using C's time.h clock() method and trying to have a large enough opcode loop to minimize the impact for the lack of clock/timer accuracy. My brief search wasn't yielding a lot of options in the 486's architecture for good timer options.
Re: ao486 Performance Technical Discussion
Besides the performance, i think something wrong with either interrupt disabling or specifically to keyboard controller disabling.
When PC is booting and in time when RAM expander is loading (QEMM for example) and you press any key at that time, most likely it will crash. DOS4GW apps are also affected by this issue. Something happens only in loading QEMM (and loading/unloading of DOS4GW) procedure.
When PC is booting and in time when RAM expander is loading (QEMM for example) and you press any key at that time, most likely it will crash. DOS4GW apps are also affected by this issue. Something happens only in loading QEMM (and loading/unloading of DOS4GW) procedure.
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
Possibly related, but a lot of the development tools provided with FreeDOS hang under DOS (also tried DOS 6.22) on ao486. Trying to run GCC, NASM, MASM, or a handful of other programs seemed to hang the core with a reset being the only option. I was only able to get those tools to run (and the programs they compiled) under Windows 95 with the FreeDOS VHD mounted as a second drive.Sorgelig wrote:Besides the performance, i think something wrong with either interrupt disabling or specifically to keyboard controller disabling.
When PC is booting and in time when RAM expander is loading (QEMM for example) and you press any key at that time, most likely it will crash. DOS4GW apps are also affected by this issue. Something happens only in loading QEMM (and loading/unloading of DOS4GW) procedure.
Those same tools ran under a DOS VM with a modern processor, even when restricted to presenting a 486 CPU.
Re: ao486 Performance Technical Discussion
The BIGGEST problem I have seen with a LOT of various programs (like MASM, which is what I use) are the missing CPU instructions (CMPXCHG8B, MOV to/from control register, etc.) and especially because there is no FPU! The lack of a FPU kills a TON of stuff because every 486 and later had a FPU built-in, so most everything used the FPU either deliberately or unknowingly through a library call that relies on the FPU.
There are definitely interrupt issues. There is something wrong with the interrupt controller hardware. This seems to affect everything in the system. It's like once an interrupt occurs the entire system is held off for some period of time. This results in some recursion that hangs the system.
There are definitely interrupt issues. There is something wrong with the interrupt controller hardware. This seems to affect everything in the system. It's like once an interrupt occurs the entire system is held off for some period of time. This results in some recursion that hangs the system.
I am the flux ninja
Re: ao486 Performance Technical Discussion
Not every. Most i486SX had no FPU.JimDrew wrote:because every 486 and later had a FPU built-in
You are free to contribute to opensource project. Currently you only suck the money from MiSTer. So, you are definitely not the one who can complain.
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
I've been trying to formulate a hypothesis as to why memory reads would be considerably slower than writes in ao486. The best idea I have at the moment (based on my limited understanding) is a potential issue in the TLB that would manifest as TLB thrashing.
At the moment I'm combing through the memory management code trying to get a grasp on how it operates with particular attention to the TLB. If anyone knows what else I should be paying attention to I'm certainly open to pointers, otherwise I'm going to do my best to get as good of an understanding as possible and then try to decide how best to run some tests.
At the moment I'm combing through the memory management code trying to get a grasp on how it operates with particular attention to the TLB. If anyone knows what else I should be paying attention to I'm certainly open to pointers, otherwise I'm going to do my best to get as good of an understanding as possible and then try to decide how best to run some tests.
Re: ao486 Performance Technical Discussion
We really didn't use the SX versions in the U.S. We basically went from the 38, briefly to the 486, and straight to the Pentium (math bug and all). There were a slew programs that required the FPU to run by the time the 486 was popular.
I am certainly not complaining. I was pointing out that the lack of the FPU is probably the biggest issue with the core's compatibility, followed by the interrupt controller. I am not "sucking money out of MiSTer". I am selling 95%+ of my products to universities and students who are using the DE-10 as an educational tool. It's the reason why i am working with Terasic on the SDRAM reliability differences between different DE-10 Nano boards.
I am certainly not complaining. I was pointing out that the lack of the FPU is probably the biggest issue with the core's compatibility, followed by the interrupt controller. I am not "sucking money out of MiSTer". I am selling 95%+ of my products to universities and students who are using the DE-10 as an educational tool. It's the reason why i am working with Terasic on the SDRAM reliability differences between different DE-10 Nano boards.
I am the flux ninja
Re: ao486 Performance Technical Discussion
with sequential read of memory, cache doesn't work as data is not in cache yet. So, in this case it will hit the DDR3 latency issue and thus is slow. Probably pre-fetch the data will speed up it. Or switch to SDRAM..calvinmorrow wrote:I've been trying to formulate a hypothesis as to why memory reads would be considerably slower than writes in ao486. The best idea I have at the moment (based on my limited understanding) is a potential issue in the TLB that would manifest as TLB thrashing.
At the moment I'm combing through the memory management code trying to get a grasp on how it operates with particular attention to the TLB. If anyone knows what else I should be paying attention to I'm certainly open to pointers, otherwise I'm going to do my best to get as good of an understanding as possible and then try to decide how best to run some tests.
-
- Atariator
- Posts: 17
- Joined: Thu Oct 31, 2019 6:17 pm
Re: ao486 Performance Technical Discussion
The read test I ran dereferenced the same RAM address 100000000 times at 3-4x the write latency rather than a sequential read.Sorgelig wrote: with sequential read of memory, cache doesn't work as data is not in cache yet. So, in this case it will hit the DDR3 latency issue and thus is slow. Probably pre-fetch the data will speed up it. Or switch to SDRAM..
I don't think prefetching has been implemented that I can tell, but with it enabled, we should be able to get roughly the same performance (or slightly better) than the memory writes correct? I was expecting reads should run closer to 6-8 seconds rather than almost 30.
Re: ao486 Performance Technical Discussion
No, pre-fetch is not implemented.calvinmorrow wrote:The read test I ran dereferenced the same RAM address 100000000 times at 3-4x the write latency rather than a sequential read.Sorgelig wrote: with sequential read of memory, cache doesn't work as data is not in cache yet. So, in this case it will hit the DDR3 latency issue and thus is slow. Probably pre-fetch the data will speed up it. Or switch to SDRAM..
I don't think prefetching has been implemented that I can tell, but with it enabled, we should be able to get roughly the same performance (or slightly better) than the memory writes correct? I was expecting reads should run closer to 6-8 seconds rather than almost 30.
After some more precise exploration of memory access, i think DDR3 is not effectively used. Although address translator can use up to 256 words burst it doesn't look like using it. The clients are 4-word burst only with 32bits, so it means only 2x burst on 64bit DDR3 bus. Pretty much ineffective. It either needs external cache like it was on old main boards, or SDRAM use.
With 128MB SDRAM module it's now possible to switch without loosing in memory size.
Re: ao486 Performance Technical Discussion
I would like to have an option for change about or ram avaiable to ao486 core, actually is 64mb and is too much for dos games like Aladdin (even if enabled EMS ram via EMM386, is too much), would like to have 1mb, 2mb, 4mb, 8mb, 16mb, 32mb and 64mb options for better compatibility with games..
Last edited by Lroby74 on Mon Nov 04, 2019 7:04 am, edited 1 time in total.
Re: ao486 Performance Technical Discussion
That's not true at all, you must have been out of the country for 5 years or so. I had a gateway 2000 486 SX 25mhz as my first 486. It wasn't until years later that I upgraded to a 486 dx 100. I knew a lot of other people with the same cpu at the time.JimDrew wrote:We really didn't use the SX versions in the U.S. We basically went from the 38, briefly to the 486, and straight to the Pentium (math bug and all). There were a slew programs that required the FPU to run by the time the 486 was popular.
I am certainly not complaining. I was pointing out that the lack of the FPU is probably the biggest issue with the core's compatibility, followed by the interrupt controller. I am not "sucking money out of MiSTer". I am selling 95%+ of my products to universities and students who are using the DE-10 as an educational tool. It's the reason why i am working with Terasic on the SDRAM reliability differences between different DE-10 Nano boards.
-
- Obsessive compulsive Atari behavior
- Posts: 133
- Joined: Wed Aug 03, 2005 11:45 am
- Location: Ohio, USA
Re: ao486 Performance Technical Discussion
That was my experience as well, at least in central new york, 486sx25 chips everywhere, which was odd, as they weren't all that much faster than a 386dx-40, given the price delta...kitrinx wrote:That's not true at all, you must have been out of the country for 5 years or so. I had a gateway 2000 486 SX 25mhz as my first 486. It wasn't until years later that I upgraded to a 486 dx 100. I knew a lot of other people with the same cpu at the time.JimDrew wrote:We really didn't use the SX versions in the U.S. We basically went from the 38, briefly to the 486, and straight to the Pentium (math bug and all). There were a slew programs that required the FPU to run by the time the 486 was popular.
I am certainly not complaining. I was pointing out that the lack of the FPU is probably the biggest issue with the core's compatibility, followed by the interrupt controller. I am not "sucking money out of MiSTer". I am selling 95%+ of my products to universities and students who are using the DE-10 as an educational tool. It's the reason why i am working with Terasic on the SDRAM reliability differences between different DE-10 Nano boards.
US based seller MiSTer Expansion Boards and Atari items
https://www.legacypixels.com/
https://www.legacypixels.com/
Re: ao486 Performance Technical Discussion
This disconnect might have to do with how the market was segmented at the time. Brand new systems were routinely sold based on CPUs one or two generations old because each new generation typically launched with only a high-end chip targeted at business users. Going down the price scale was basically going back in time. So if you were doing stuff like CAD or software development, you could probably afford and seriously benefit from a 486DX ca. 1990 and a Pentium in 1993. But if you were a home user on a budget, you might very well have bought a 486SX-based system as late as 1995.Poobah wrote:That was my experience as well, at least in central new york, 486sx25 chips everywhere, which was odd, as they weren't all that much faster than a 386dx-40, given the price delta...kitrinx wrote:That's not true at all, you must have been out of the country for 5 years or so. I had a gateway 2000 486 SX 25mhz as my first 486. It wasn't until years later that I upgraded to a 486 dx 100. I knew a lot of other people with the same cpu at the time.JimDrew wrote:We really didn't use the SX versions in the U.S. We basically went from the 38, briefly to the 486, and straight to the Pentium (math bug and all). There were a slew programs that required the FPU to run by the time the 486 was popular.
I am certainly not complaining. I was pointing out that the lack of the FPU is probably the biggest issue with the core's compatibility, followed by the interrupt controller. I am not "sucking money out of MiSTer". I am selling 95%+ of my products to universities and students who are using the DE-10 as an educational tool. It's the reason why i am working with Terasic on the SDRAM reliability differences between different DE-10 Nano boards.
Re: ao486 Performance Technical Discussion
That's precisely why they were not popular in the U.S. Gateway immediately dropped them after releasing systems with them and switched to non-SX and the Pentium.Poobah wrote:That was my experience as well, at least in central new york, 486sx25 chips everywhere, which was odd, as they weren't all that much faster than a 386dx-40, given the price delta...
I am the flux ninja
Re: ao486 Performance Technical Discussion
The 486sx was commonly paired with the (P24T) "Pentium Overdrive" socket which contained an extra row of pins unused by the 486sx CPU.Poobah wrote: That was my experience as well, at least in central new york, 486sx25 chips everywhere, which was odd, as they weren't all that much faster than a 386dx-40, given the price delta...
The pretext was that people would buy a lower-cost 486sx entry-level system with the promise it could later be upgraded to a (not yet available) Pentium CPU variant.
Intel just never made the P24T a competitive option as far as price...
-
- Atariator
- Posts: 23
- Joined: Tue Jul 18, 2017 8:31 am
- Location: Singapore
Re: ao486 Performance Technical Discussion
Notice that there is also the 586 with quite a lot of work done from what we can read at
https://opencores.org/projects/v586 ... to keep in mind, some modules can be re-used.
https://opencores.org/projects/v586 ... to keep in mind, some modules can be re-used.
Brahim HAMADI CHAREF:: Singapore
Re: ao486 Performance Technical Discussion
There is also Zet: http://zet.aluzina.org/index.php/Zet_processor
It seems less mature than the other cores, but maybe it has some code bits that could be useful.
It seems less mature than the other cores, but maybe it has some code bits that could be useful.
Re: ao486 Performance Technical Discussion
So this is a 486?softtest1 wrote:There is also Zet: http://zet.aluzina.org/index.php/Zet_processor
It seems less mature than the other cores, but maybe it has some code bits that could be useful.
. It can boot successfully MS-DOS 6.22, FreeDOS 1.1 and run Microsoft Windows 3.0 and other MS-DOS games.