4096 colors

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
nativ
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4114
Joined: Mon Jul 30, 2007 10:26 am
Location: South West, UK

Re: 4096 colors

Post by nativ »

What is the highest colour 'gaming' screen possible? Or to put it another way the lowest version of pcs/specturm that would be able to be used for a regular game?

Glad to hear its all coming together d.m.l.
Atari STFM 512 / STe 4MB / Mega ST+DSP / Falcon 4MB 16Mhz 68882 - DVD/CDRW/ZIP/DAT - FDI / Jaguar / Lynx 1&2 / 7800 / 2600 / XE 130+SD Card // Sega Dreamcast / Mega2+CD2 // Apple G4

http://soundcloud.com/nativ ~ http://soundcloud.com/nativ-1 ~ http://soundcloud.com/knot_music
http://soundcloud.com/push-sounds ~ http://soundcloud.com/push-records
evil
Captain Atari
Captain Atari
Posts: 285
Joined: Sun Nov 12, 2006 8:03 pm
Location: Devpac

Re: 4096 colors

Post by evil »

dml wrote: TBH my memory is beginning to come back - and it's likely I didn't try it on the ST/e machines at all. It would have been the Falcon, and for a plasma zoomer thingy (changing colour 0 only). I'm sure I still have the test somewhere here on the 030's HD.
Ok, doing hard-sync on Falcon would be close to hopeless I guess with all the different DMA things going on, not to mention the 31 kHz hsync (VGA) which would make the colour changes twice as long. But the real killer is the videl bug that produces noise on the screen while updating the palette.

dml wrote: If I had done it on the ST for PCS, I think it would have caused extra problems with colour count per line (line ending half way through a palette, needing multiple blits or different colour alloc rules per scanline) which I would definitely remember.
Yes 56 colours being uneven by 16 is a little problem. We'd need to feed 48 colours with the blitter (during visible scanlines) and the remaining 8 colours with CPU in the borders. Setting sy/sx inc. and dx/dy inc. correctly should also take away most of the blitter setup for each line (remember the blitter inc can be negative so it will "loop" through the colour registers without any manual correction). The colour data would need to be organised in two buffers, one for blitter and one for CPU to avoid having to correct modulo.

Simple calc:

move.w d0,(a0) ;8 cycles (a0=$ffff8a38 (blitter rows))
move.b d1,(a1) ;8 cycles + 8 cycles (a1=$ffff8a3c (start blitter (Mega STe is 8+12)))
48 * 8 = 384 cycles for the blitter pass
4*move.l (a2)+,(a3)+ = 80 cycles (a2=colour data, a3=$ffff8240 (CPU pass))
move.l a4,a3 ;4 cycles (reset a3 to $ffff8240)

Total: 492 cycles (STe) and 496 on MSTe.

So unless I did a brainfart (I often do :)) it looks doable at least with 199 lines.
The lower border should be able to fit in 16 cycles if done perfect (mixed in with the CPU code) and hence we are at exact 512 cycles on MSTe and 508 on STe. Wehoo!

dml wrote: I'm already looking at a portrait/landscape overscan mode for v5, which may work better now with the new colour allocation rules. I tried landscape mode years ago but the reproduction was poor because of the lower colour change density, with simplistic colour allocation causing streaks to appear in the image, and it gets worse as the palette bit depth is increased and competition increases. That should all be dealt with this time round.
Cool, looking forward to it :-)

dml wrote: A full overscan mode would be nice too but bottom border could be a pain - I can't remember where it occurs in the display frame and it probably requires a special ruleset to map colours on that line. I may try it but it will be much later - I'll see how the other modes go first. If it isn't going well I may change my mind :)
The lower border is mixed in with the left border killing. In my case I've decided to waste a little more CPU on the borders to save address registers, and a scanline looks like this on ST (each move is 12 cycles):

Code: Select all

		; D7 = 2

		; NORMAL SCANLINE

		;left border
		move.b	d7,$ffff8260.w
		move.w	d7,$ffff8260.w
		dcb.w	88,$4e71
		;right border
		move.w	d7,$ffff820a.w
		move.b	d7,$ffff820a.w
		dcb.w	11,$4e71
		;stabilizer
		move.b	d7,$ffff8260.w
		move.w	d7,$ffff8260.w
		dcb.w	11,$4e71


		;LOWER BORDER SCANLINE

		;left and bottom
		move.w	d7,$ffff820a.w ; this instruction is set last on the line before
		move.b	d7,$ffff8260.w
		move.w	d7,$ffff8260.w
		move.b	d7,$ffff820a.w
		dcb.w	85,$4e71
		;right border
		move.w	d7,$ffff820a.w
		move.b	d7,$ffff820a.w
		dcb.w	11,$4e71
		;stabilizer
		move.b	d7,$ffff8260.w
		move.w	d7,$ffff8260.w
		dcb.w	11,$4e71		
On the STe, it's not necessary to have the stabilizer code, saving some cycles.
That also reduces the linewidth from the odd 230 bytes to more even 224 bytes.

I drew up a little page before to explain overscan and linewidths to some guys, might be useful:
http://ae.dhs.nu/overscan/

Also, wasting a few address reigsters can shave up more cycles, but I've yet to try timings on that to know exactly were we end up at, but probably around 4 cycles for each border and stabilizer, that's becase you need to pad with some nops as the switches gets too fast otherwise.
mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: 4096 colors

Post by mc6809e »

dml wrote: So are you saying that an additional / alternate form of sync is required before starting an external device, versus normal CPU bus cycles? and that it's due to the *internal* clock?

cheers
If you want writes to the palette registers to be perfectly timed so that they correspond to particular positions on the screen, and you're using HB interrupts, then yes, you'll have to take into account the E clock.

Simply reloading the palette registers at the end of a scanline on a HB interrupt with the blitter isn't much of a problem. Small variations in timing don't matter. But suppose you desire to reload the registers at a precise point on a scanline? The variation in interrupt latency is going to make it difficult to start the blitter at the precise moment.

But by precisely timing blits to palette registers, you can get color changes in the middle of a scanline exactly where you want them. And by properly setting up the destination increment registers, it's even possible to continuously write to all the palette registers over the entire length of a scanline.

Another trick is to setup the blitter to write to one particular palette register over and over again by using a destination X increment of -2. This creates a pseudo low resolution 64-pixel wide, 64 colors [oops -- 40 pixel, 40 colors] per scanline background and the remaining 15 colors can be used for sprites on top of this background. You might even try some other combinations, like updating the same 4 palette registers over and over for a 2 bitplane background and using the remaining 2 bitplanes for sprites.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

evil wrote: Ok, doing hard-sync on Falcon would be close to hopeless I guess with all the different DMA things going on, not to mention the 31 kHz hsync (VGA) which would make the colour changes twice as long. But the real killer is the videl bug that produces noise on the screen while updating the palette.
It would have been composite/TV mode I was using at the time for demo stuff (it was from a time before CRTs became quite dead and people still had them :). Still not quite clear on the details but I'll update if/when I find the test and figure out what it was doing and where it was going wrong, just in case it was an ST experiment after all.

(back to CPU version for bit...)
I notice that today's experiments with STEEM show the familar 1-pixel vertical lines in PCS images when the colour allocation rules don't exclude recently changed palette indices. These lines are stable under STEEM but in my original experiments (old Mega4) this line would be noisy and subject to machine startup conditions. There's a 1-clock 'window of uncertainty' involved. I have attached a pic (excuse the terrible quality - just a simplified test).

without sync correction rule:
Image
with sync correction rule:
Image

(My vague recollection of the 'early blitter experiment' I referred to earlier resulted in something similar - but the window just appeared to be bigger and more noisy).
evil wrote: Yes 56 colours being uneven by 16 is a little problem. We'd need to feed 48 colours with the blitter (during visible scanlines) and the remaining 8 colours with CPU in the borders.
I can see various trades involved with selecting which occurs first - cpu or blit. Reg reloading is obviously best done inside the border - as early as possible, but starting a blit in the middle of a scan will take longer than starting a preloaded movem. Preloading registers then starting the blit early (similar to your example) is likely better. OTOH, leaving some registers free could help reduce this delay, at the cost of dropping movem or using a smaller movem. Not using a movem for the CPU part at all means *slightly* less change density for that section (more density is good, since you want most of the changes within the scanline if possible).

Anyway I'd probably start with your example because you have the timings there already - and improvements on that aren't likely to be big, if anything :)
evil wrote: So unless I did a brainfart (I often do :)) it looks doable at least with 199 lines.
Looks decent to me. It's worth trying. I'm still working on the new convertor, but after some fuss porting the image library (eek) I have it basically working now so I might shift back to display stuff for a bit once I'm happy most of the 'old functionality' is in.

I'm looking to refactor the display devices into handlers, such that new handlers can be built for new devices and contain all the code specific to that device. So it should be easier to add modes for machine variants, overscan modes, cpu speeds etc. for anyone with the lifespan and determination to fill that matrix out :-)
evil wrote: The lower border should be able to fit in 16 cycles if done perfect (mixed in with the CPU code) and hence we are at exact 512 cycles on MSTe and 508 on STe. Wehoo!
Yes that's neat - almost evil :-)
evil wrote: The lower border is mixed in with the left border killing. In my case I've decided to waste a little more CPU on the borders to save address registers, and a scanline looks like this on ST (each move is 12 cycles):
Thanks for all the samples, it will definitely speed things along I'm sure - I'll refer/mention your input in the next version when i get that stuff working. I'll also link the relevant DHS notes writeups from the site.
evil wrote: I drew up a little page before to explain overscan and linewidths to some guys, might be useful:
http://ae.dhs.nu/overscan/
brilliant.
evil wrote: Also, wasting a few address reigsters can shave up more cycles, but I've yet to try timings on that to know exactly were we end up at, but probably around 4 cycles for each border and stabilizer, that's becase you need to pad with some nops as the switches gets too fast otherwise.
In order to avoid getting myself into a mess, I'd probably start with just getting a reliable reference working before shaving cycles here and there, it's likely to involve testing on real machines - which means waiting for that floppy emulator to arrive (to help with the cross-dev). I might try the overscan stuff in Steem for a laugh but not holding out a lot of hope that it will either work or do the same thing as a real machine. I'm impressed with it so far but this may be pushing it :)

I also still need to sort out an assembler to use along with gcc - either vasm or just put up with gas (yuck). I haven't checked the object format for vasm but if it's compatible with the gcc linker I'll just use it. Having to use devpac with cross-dev is uncomfortable. It's ok when you're working exclusively on the native machine but painful from the PC and my Ataris are still being put back together...

cheers.
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

...so I have uploaded an early version of the new convertor v5 which currently implements only the 'photochrome' (STE) mode with a fixed version of the original 'hatched' dithering.

It still using a very simplistic colour allocator but I have now sorted out the nasty strobing issue :-z. The dual dithering in the old version looks like it was actually wrong and I didn't notice. So the average intensity of the frames was not exactly equal and therefore flickered more than necessary. gaaah :-( So the DHS 'complaint' was accurate but not for the reasons assumed!

Anyway a 68k test build can be found here: http://www.leonik.net/dml/sec_pcs.py

It's pretty slow on an 8mhz machine and uses tons of ram - partly due to the FreeImage file format library being massive (it's especially slow if your input image is not exactly 320x200!) and bloating my executable. Experimenters may want to consider running it on something bigger (like an emulator :-) to get images out until I do something about that.

I may drop FreeImage if I can't prune the plugins down and make it more efficient, it just seemed like a handy way to get access to tons of file formats. This library is well used by games devs but it's easy to forget just how fat some of this stuff is compared with what we had 20 years ago!

I'm not putting up a x86 build yet because there are likely still some intel/motorola endian issues to fix. I did my best while writing the code but haven't tested it.
CORRECTION - the x86 build did actually work so it's now on the site too. You might need to install cygwin tho - I was in a hurry and didn't check.

Anyway overall not bad I think for a few evenings work, from scratch. I still have a few things to sort out before I start on the improvements but I think the output is already better than v4...
Last edited by dml on Mon Aug 27, 2012 8:39 pm, edited 1 time in total.
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 3170
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: 4096 colors

Post by Cyprian »

wow I really like a such topics :D

Dal pls share source code for that converter.
More or les year ago I was struggling with a such tool but I died on color quantization for spectrum512 color allocation (48 colors, every changed 12 pixels)
Actually I was tried to implement blitter spectrum - change color every 8 pixels. Asm routine was done but I had no possibility to convert any image to that format...
Lynx I / Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

Cyprian wrote:wow I really like a such topics :D
it's nice to be back in the Atari community :)
Cyprian wrote:Dal pls share source code for that converter.
The source will definitely be made public. I'm holding it just long enough to make it modular, so it can be extended properly. It's not quite ready yet but in a few days or a week will be open.
Cyprian wrote:More or les year ago I was struggling with a such tool but I died on color quantization for spectrum512 color allocation (48 colors, every changed 12 pixels)
It is a difficult problem. I have looked at different solutions to it, but it is effectively an NP-complete problem, or NP-hard minimum. An optimal solution isn't obvious and perhaps not possible.

The old PCS (implemented in 68k asm - which is why it was quick, but also a bit too simplistic) used reverse-carry indexing to subdivide spans of pixels progressively, to make colour allocation more 'fair'. But it was really just sidestepping a very difficult problem.

I have a new algorithm to try, which uses weighted bins with every colour allocated to every palette index, and progressively re-balances colours so they get allocated to bins where they can be merged with the minimum error. This is similar to the cube-splitting colour reduction algorithm but better at dealing with these overlapping palettes....

I have some other improvements to make which are perception-oriented, such as managing colours which exist only at edges or are isolated, versus colours within fine gradients. There is also the issue of error diffusion and how best to manage the error. Lots of stuff to try.
Cyprian wrote:Actually I was tried to implement blitter spectrum - change color every 8 pixels. Asm routine was done but I had no possibility to convert any image to that format...
Well hopefully this tool will make that easier. Just implement your own display code, and provide sync tables for the colour change timing. Most of the rest will be solved. If different colour reduction, dithering or device layouts are needed those can be replaced too.
User avatar
DarkLord
Ultimate Atarian
Ultimate Atarian
Posts: 5537
Joined: Mon Aug 16, 2004 12:06 pm
Location: Prestonsburg, KY - USA
Contact:

Re: 4096 colors

Post by DarkLord »

Crap - have to wait until the full version is out, or go dig out my Mega STe. I've got
it packed away in a closet right now (shame, I know, but no room ATM).

Thanks for the update(s) Doug! :cheers:
Welcome To DarkForce! http://www.darkforce.org "The Fuji Lives.!"
Atari SW/HW based BBS - Telnet:darkforce-bbs.dyndns.org 1040
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2624
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia
Contact:

Re: 4096 colors

Post by calimero »

Just to ask same question as Nativ:

so there are no CPU time left if you make 3 palete switch (48 colors) per scanline?

it should be like 3 times * 16 * move.w (one palete entry is 9 bits so it is a word?) per scanline, right?
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

calimero wrote:Just to ask same question as Nativ:
so there are no CPU time left if you make 3 palete switch (48 colors) per scanline?
There are 48 (3 x full set of 16) colour changes per scan yes. However there are more like 50-60 colours *available* on each scanline (I didn't calculate the true number out of lazyness) because each scan refers to part of the last palette on the previous scan. So there are approx 4 palettes per scanline used in the display - a little less than that, because of this borrowing effect.

In terms of data transferred however - that's 3 palettes per line.
calimero wrote: it should be like 3 times * 16 * move.w (one palete entry is 9 bits so it is a word?) per scanline, right?
Something like that. There are different ways to load the colours but it amounts to the same process overall. It is words yes. 7 (or 4) bits are wasted depending on ST/STE.

There is actually some time left to load more colours but not an entire palette - a partial palette. I think some people do this to get more colours overall but this can cause the colour changes to spread out into the borders and reduce the 'change density' during display time.

As for doing 'other things' with the time - that is also possible, but seriously difficult. If the top and bottom border is unused (as with most of the display routs) then you have that time to do other things. Maybe 15% CPU approx (somebody will have better figures).

If you want to use cycles within the scanline - that's the doman of demo programming. Whatever you do, it has to be extremely tight, very fine grained (to fit in small cycle windows) and have constant time overhead (no wobble). No multiplications, shifts or other variable-time operations as these would require re-synchronizing the cpu with the display 'timer'.

I once considered implementing a small 'virtual cpu' inside the scanline of an overscan display rout, in order to execute code in a regulated way, from inside that. But it's pretty hard, and you're effectively implementing a new cpu or virtual machine, and then have to write code for that somehow. Painful and not recommended. Might be quite confusing for coders watching it though. :)
User avatar
nativ
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4114
Joined: Mon Jul 30, 2007 10:26 am
Location: South West, UK

Re: 4096 colors

Post by nativ »

dml wrote:[I once considered implementing a small 'virtual cpu' inside the scanline of an overscan display rout, in order to execute code in a regulated way, from inside that. But it's pretty hard, and you're effectively implementing a new cpu or virtual machine, and then have to write code for that somehow. Painful and not recommended. Might be quite confusing for coders watching it though. :)
In a couple of the recent Atari Arcade > Atari ST fixes Klaz turned up some 6502 'cores' that were left to run the game logic, I believe.

If you used a 'widescreen' left and right border ( perhaps containing the Score? ).... this would still be a ***Full colour screen*** ??? 8O

Have you ever seen Gauntlet IV on the MegaDrive? not sure how many colours it uses, but there's a funky overscan full screen to the game!
Atari STFM 512 / STe 4MB / Mega ST+DSP / Falcon 4MB 16Mhz 68882 - DVD/CDRW/ZIP/DAT - FDI / Jaguar / Lynx 1&2 / 7800 / 2600 / XE 130+SD Card // Sega Dreamcast / Mega2+CD2 // Apple G4

http://soundcloud.com/nativ ~ http://soundcloud.com/nativ-1 ~ http://soundcloud.com/knot_music
http://soundcloud.com/push-sounds ~ http://soundcloud.com/push-records
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2624
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia
Contact:

Re: 4096 colors

Post by calimero »

thanx for replay dml.

so you essentially load new colors in color tables as scanline progress (e.g. you will have one new color every xx pixels)?
would it be possible to reserve e.g. two color registers for simple sprite/cursor over static image (would there be enough CPU time to draw/mask cursor)?

and if you use blitter for settings color tables? will it free some cpu more?

I ask all this to see if it is possible to write point and click adventure with 512 colors...
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

calimero wrote: so you essentially load new colors in color tables as scanline progress (e.g. you will have one new color every xx pixels)?
Yes. The palettes are 'skewed' because one change occurs every xx pixels, and it's a different index changing each time. Some pixels can see into more than one palette at a time, but never more than 16 colours available at any time per pixel of course.

The code used to do the transfer decides the spacing between colour changes.
calimero wrote:would it be possible to reserve e.g. two color registers for simple sprite/cursor over static image (would there be enough CPU time to draw/mask cursor)?
Yes you can reserve colour registers (if you want to mask sprites or other graphics on top) and you can probably reserve a plane or two if required (if you want to draw without masks) - but the more you reserve the more damage you'll do to the image.

I'll see if I can incorporate the 'reserved colours & planes' idea into PCS v5, in case it helps with projects like this. It should be reasonably straightforward (parameter + new conversion tables) although a specific display routine will be needed for each permutation -

CORRECTION - that's not really true a standard display rout would work too, but you'll get more colours on screen if the bandwidth isnt wasted transferring the same colours to the same regs over and over and that means a custom routine. Best work with a standard display first and pay the price of a few colours then claim some colours back with a better version later.
calimero wrote:and if you use blitter for settings color tables? will it free some cpu more?
No because the blitter hogs the bus while it is working, locking out the CPU. It just takes less time to do its work. It's not really practical to try to 'buy back' time within scanlines used in this way (for overscan, or palette boosting). At best you can reserve colours or change colours at different rates. The CPU time available is based on how many scanlines you 'own' for this task, and how many remain free.

e.g. an image with left/right overscan will use the same amount of cpu as a normal image (but fewer colours possible per scanline, because some time needed to do overscan). Free CPU will be the same.
calimero wrote:I ask all this to see if it is possible to write point and click adventure with 512 colors...
I think it is likely you can do this yes. You'll need to write efficient code so it can execute in the remaining 10-20% CPU time without feeling unresponsive. Processes which update the screen would be best done incrementally / as changes, rather than drawing something every single frame (except the mouse cursor of course!). Even updating text should be done gradually a character at a time, 'threaded' with other work. A little work scheduler would be a good idea.

I wish you success with your plan :)
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

I just took apart the old 'tobias richter' slideshow of mine to find out what the overscan/colour boost stuff was doing. It works quite differently from PCS and the image quality isn't great. The images are double-height interlaced (must be around 400 or 500 lines) and there is only one palette change per scan. It doesn't look like I was short of cycles for colour changes - it looks more like I didn't produce a sophisticated enough colour reduction routine to handle skewed palettes at the time (must have been before PCS then). So there is indeed plenty of dead time to load more colours in overscan mode.

The overscan code used there is not very compact - there wasn't any optimisation required to load one palette - but amazingly it does work in Steem (!?). That's one solid emulator...
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Re: 4096 colors

Post by Nyh »

calimero wrote:Just to ask same question as Nativ:

so there are no CPU time left if you make 3 palete switch (48 colors) per scanline?

it should be like 3 times * 16 * move.w (one palete entry is 9 bits so it is a word?) per scanline, right?
There is some time left. One of the projects I still want to do is writing the most colorful game for the ST ever. The game is planned with a maximum of 44 colors per scanline. I have written a proof of concept code part. Before I start this game I want to finish an other game sitting on my harddisk for more then a year. Then I have to write some very specific color quantization routines to create the graphics for the game. I think this will be a special research project for me into color quantization. How to make the most of the limited colors available is not a trivial task. Having done this difficult but interesting hurdle comes the most difficult task: creating beautiful graphics and put the game together.

The answer to your question is yes, I thing it is possible to make a game for the ST using 44 colors per scanline.

Hans Wessels
evil
Captain Atari
Captain Atari
Posts: 285
Joined: Sun Nov 12, 2006 8:03 pm
Location: Devpac

Re: 4096 colors

Post by evil »

dml wrote: I notice that today's experiments with STEEM show the familar 1-pixel vertical lines in PCS images when the colour allocation rules don't exclude recently changed palette indices. These lines are stable under STEEM but in my original experiments (old Mega4) this line would be noisy and subject to machine startup conditions. There's a 1-clock 'window of uncertainty' involved. I have attached a pic (excuse the terrible quality - just a simplified test).
I'm on OS X so using Hatari for cross-dev. Hatari has been rock-solid for timings in fullscreen, blitter and Spectrum pictures. Plus it's still developed very activly by Mr.Styckx who himself is an expert on ST lowlevel coding (see for example the great No Cooper demo). I can't but recomend giving it a go, even if the UI is still a bit boring.

As a bonus, it will run Apex for you :-)
dml wrote: Looks decent to me. It's worth trying. I'm still working on the new convertor, but after some fuss porting the image library (eek) I have it basically working now so I might shift back to display stuff for a bit once I'm happy most of the 'old functionality' is in.
Well I did forget one tiny little thing; the blitter source needs updating as well, and it will be a problem, not just by cycles, but to start the blitter in "mid palette" will fork up the source x/y inc.

The solution is probably a giant blitter-pass from the first scanline down to the lower border scanline. The blitter will fill out 64 colours per line (the palette data has padded space for the extra 8 colours). 64 colours will be exactly 512 cycles so it fits well. The Zerkman blitter mode works like that (in Antiques demo). Once done the first 228 lines of a big blitter-pass, the lower border needs taken care of real good. Preloaded CPU registers with 8 colours, so they can be movem'ed out (40 cycles), starting blitter twice (once for the intermediate lower-border line, and once for starting up a big pass for the remaining 44 lines).

I'll think about this some more, but for sure 228 lines shouldn't be a problem, the Zerkman rout should already be able to do that if he had killed the top border.

dml wrote: I also still need to sort out an assembler to use along with gcc - either vasm or just put up with gas (yuck). I haven't checked the object format for vasm but if it's compatible with the gcc linker I'll just use it. Having to use devpac with cross-dev is uncomfortable. It's ok when you're working exclusively on the native machine but painful from the PC and my Ataris are still being put back together...
With vasm comes vbcc and vlink as well, the object formats between the vbcc/vasm are of course compatible with vlink :)

Example makefile for a one-object assembler project (I'm a strange person who never learned C but does everything in assembler instead..):

Code: Select all

PATH := $(PATH):/usr/local/bin:/opt/vbcc/bin

CC = /opt/vbcc/bin/vc
ASM = /opt/vbcc/bin/vasm
LD = /opt/vbcc/bin/vlink
CFLAGS	= -cpu=68000 -O1
ASMFLAGS = -m68000 -Felf -noesc -nosym -quiet -no-opt
LDFLAGS = -bataritos -tos-flags 7
LOADLIBES = 
LDLIBS =

PRG = main.tos
OBJ = main.o

.PHONY:	main.s	# always rebuild target

all : $(PRG)

install : $(all)
	mcopy -o main.tos e:main.tos
	sync

$(PRG):	$(OBJ)
	$(LD) $< $(LDFLAGS) -o $@
.c.o:
	$(CC) -c $(CFLAGS) $<
.s.o:
	$(ASM) $(ASMFLAGS) $< -o $@

main.o:	$(SRC)

clean:	
	rm -f $(PRG) $(OBJ)
I'm using it with Eclipse or Xcode, both does 68k syntax hilight without further setup.
Moved from Xcode to Eclipse to become a little more platform independent.
dml wrote: It still using a very simplistic colour allocator but I have now sorted out the nasty strobing issue :-z. The dual dithering in the old version looks like it was actually wrong and I didn't notice. So the average intensity of the frames was not exactly equal and therefore flickered more than necessary. gaaah :-( So the DHS 'complaint' was accurate but not for the reasons assumed!
Cool, looking forward to the improved colour allocator. It would also be very neat if it could handle any Y resolution (can even easily hardscroll that on ST with 8 scans at a time), then with a good dither.. Wow all I need to do then is to run the source 24-bit image one time through Photochrome and be done with it. What an improvement! Might be a good time for a making a slideshow then :)
User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2624
Joined: Thu Sep 15, 2005 10:01 am
Location: Serbia
Contact:

Re: 4096 colors

Post by calimero »

Nyh wrote: There is some time left. One of the projects I still want to do is writing the most colorful game for the ST ever. The game is planned with a maximum of 44 colors per scanline. I have written a proof of concept code part. Before I start this game I want to finish an other game sitting on my harddisk for more then a year. Then I have to write some very specific color quantization routines to create the graphics for the game. I think this will be a special research project for me into color quantization. How to make the most of the limited colors available is not a trivial task. Having done this difficult but interesting hurdle comes the most difficult task: creating beautiful graphics and put the game together.

The answer to your question is yes, I thing it is possible to make a game for the ST using 44 colors per scanline.

Hans Wessels
I have in mind: wet.atari.org - I already wrote game engine in GFA basic, what is left is to cut/convert complete graphics from PC.
for Spectrum 512-like version of game everything should be rewriten in pure asm anyway... :/

maybe Elansar could be made in Spectrum 512 technic :)

in ether way, I would suggest conversion of PC/Mac game for this project.
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X
User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 13509
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: 4096 colors

Post by wongck »

calimero wrote: I have in mind: wet.atari.org - I already wrote game engine in GFA basic, what is left is ....
yeah was waiting for that game to come out.... for sometime now.
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff that are no longer for sale due to them over 30 years old - click here for list
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

Well the other night I was playing with the reworked convertor and I managed to do a few new things with it.

1) Change the pixel reduction order from recursive subdivision of each scanline (but otherwise still scanline order) to using a linear congruential generator, sampling the full image once in pseudo-random order. The error is better distributed because subsequent scanlines aren't suffering from palette choices already fixed at the end of the previous line.

2) Switch the error calculation into CIELAB space (reference white = D65), instead of RGB space, in an attempt to make error less perceptible to the eye, versus just going for the minimum numerical error in the palette values themselves. This conversion is expensive and stops an ST dead - but I'm testing on a PC so that's ok.

3) Fix the dithering code - separating the dither step into flicker-management and error-dither steps which are orthogonal to each other. I haven't finalised dithering yet but this version is at least functionally correct.

I still need to write the new reduction/bin-rebalancing algorithm but I have it figured out, will try when I get some time. It will need a lot of memory at runtime and probably quite expensive but it should do a very good job I think.

First test output image here:

https://dl.dropbox.com/u/12947585/TEST2.PCS (will only look correct using an STE or at least STE emulation)

I don't have the original reference PCS handy but I'll generate it later and edit this post. I'll update the tool soon to emit error analysis images and metrics to help figure out which settings/methods work better.

@evil - I'll reply to your post when I get home!
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Re: 4096 colors

Post by Nyh »

dml wrote:First test output image here:

https://dl.dropbox.com/u/12947585/TEST2.PCS (will only look correct using an STE or at least STE emulation)
It looks like this:
TEST2.png
Hans Wessels
You do not have the required permissions to view the files attached to this post.
evil
Captain Atari
Captain Atari
Posts: 285
Joined: Sun Nov 12, 2006 8:03 pm
Location: Devpac

Re: 4096 colors

Post by evil »

dml wrote:Well the other night I was playing with the reworked convertor and I managed to do a few new things with it.
https://dl.dropbox.com/u/12947585/TEST2.PCS (will only look correct using an STE or at least STE emulation)
Nice one :) The flicker is close to none, and the interlaced image is really getting there. Looking forward to see the original 24-bit to compare :)

Here's the pic two frames and combined (as it would look interlaced):
Image

I'd say it's getting time for a trickier picture with more sideway colours :)

Update: Note to myself: always reload page before doing a new post.
Last edited by evil on Wed Aug 29, 2012 4:58 pm, edited 1 time in total.
User avatar
nativ
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 4114
Joined: Mon Jul 30, 2007 10:26 am
Location: South West, UK

Re: 4096 colors

Post by nativ »

Atari STFM 512 / STe 4MB / Mega ST+DSP / Falcon 4MB 16Mhz 68882 - DVD/CDRW/ZIP/DAT - FDI / Jaguar / Lynx 1&2 / 7800 / 2600 / XE 130+SD Card // Sega Dreamcast / Mega2+CD2 // Apple G4

http://soundcloud.com/nativ ~ http://soundcloud.com/nativ-1 ~ http://soundcloud.com/knot_music
http://soundcloud.com/push-sounds ~ http://soundcloud.com/push-records
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

evil wrote: I'm on OS X so using Hatari for cross-dev.
I have done quite a bit of work on the mac but mostly PC based. i'll probably have a powerbook soon if work pays for it, and Frank B is going to test my makefile under OSX in the meantime. :)
evil wrote: Hatari has been rock-solid for timings in fullscreen, blitter and Spectrum pictures. Plus it's still developed very activly by Mr.Styckx who himself is an expert on ST lowlevel coding (see for example the great No Cooper demo).
I've actually been using Hatari for my Falcon cross-dev experiments :) I can boot a copy of my old Falcon in aranym, and build files to a share, and hatari looks into the share, also configured as a Falcon.

(I can probably drop aranym for most things now that I have access to compilers and assemblers on other host platforms)
evil wrote: As a bonus, it will run Apex for you :-)
:-) does it emulate the DSP morphing code properly? It runs DSPBENCH ok, which is host-port oriented so I guess thats a good sign.
evil wrote: Well I did forget one tiny little thing; the blitter source needs updating as well, and it will be a problem, not just by cycles, but to start the blitter in "mid palette" will fork up the source x/y inc.
I had another thought about this - might be some value in setting up a huge blit, but not starting it in hog mode. Restart it where necessary but don't reload the registers, and put it to sleep for CPU palette updating (if required). Doesn't solve the mid-palette problem but it might work well with a hybrid blit/cpu mixture. I can't remember if the blitter can be paused like this in nice mode or if it just resets the counters too.
evil wrote: the first 228 lines of a big blitter-pass, the lower border needs taken care of real good. Preloaded CPU registers with 8 colours, so they can be movem'ed out (40 cycles), starting blitter twice (once for the intermediate lower-border line, and once for starting up a big pass for the remaining 44 lines).
I'll think about this some more, but for sure 228 lines shouldn't be a problem, the Zerkman rout should already be able to do that if he had killed the top border.
Do you work out the timings in your head? :) I think i did some sort of stupid 'ruler' bitmap for the original PCS and then watched where the colour changes affected the ruler divisions. hehe. but later on I did it a bit more properly.

BTW if the scanline for lower border removal needs special colour reduction rules that shouldn't be a problem. I'm planning to allow specialisation per scan for this purpose.
evil wrote: Example makefile for a one-object assembler project (I'm a strange person who never learned C but does everything in assembler instead..):
Cool. It might be interesting to build an object file convertor for the gcc linker - so this stuff can be linked into C based tools etc. Not at the top of my list, but i've done a lot with ELF and gcc so it might not be that hard to do.
evil wrote: Moved from Xcode to Eclipse to become a little more platform independent.
I'm not that keen on Xcode - will probably adopt eclipse if/when I hop to mac.
evil wrote: Cool, looking forward to the improved colour allocator. It would also be very neat if it could handle any Y resolution (can even easily hardscroll that on ST with 8 scans at a time), then with a good dither.. Wow all I need to do then is to run the source 24-bit image one time through Photochrome and be done with it. What an improvement! Might be a good time for a making a slideshow then :)
I'll do an earlier release with height as an argument - maybe width too. But the plan is to make a profile syntax so you can describe the format you want, colour changes per line, planes, reserved colours, the default ruletable and overloads for specific scans, and so on. Some will be bundled/builtin but you can craft your own. More advanced stuff may involve changing the code but in most cases adding new modules. We'll see how it goes.

Will try the new colour reduction stuff and will tidy it up a bit for another test release and then work on the profiles. I'll make the code available as soon as it looks like the basic layout won't change too much.

The silly reduction algorithm is 90% written - everything except the bin-pruning step. It is very memory and cpu greedy (I didn't add it up but it must be something like 20mb for a 320x200 image already haaaha!). If it makes a better image though, that will be interesting.

cheers again for the input!
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

Got the new reduction algorithm working, albeit not complete. Wasn't too successful merging colour bins in CIELAB space - I think the space is strangely curved and merging the bins can result in illegal colours. Or my conversion has a bug which only shows when merging colours in that space. Or something related.

Anyway in RGB space it works fine. First results look pretty good. Still some issues though - namely:

1) not happening in the favoured colour space - since I'm probably generating illegal colours, I will change it to compute error in CIE but merge/average final bins in RGB - that should be near enough
2) scanline-to-scanline coherence is slightly worse than the other algorithms, because the palette solutions are even more local than before. I'm sure this can be solved in a nice way by managing which pixels the bins look at (colour-related pixels outside the scan in question) (not yet done but this algo makes that practical to do).
3) haven't implemented any of the other colour-perception tricks yet, which I think will help with efficient allocation
4) its still very very memory greedy

Having said all that, the palette solving *within* each scanline looks excellent, better than the others I tried. Here's a sample from the unfinished code:

https://dl.dropbox.com/u/12947585/TEST3.PCS

UPDATED: One more version with the CIE bug (!) fixed and dithering back on.

https://dl.dropbox.com/u/12947585/TEST4.PCS
User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3954
Joined: Sat Jun 30, 2012 9:33 am

Re: 4096 colors

Post by dml »

Right after quite a few changes and a few false starts (my last test I think was bogus, there was a bug in the compressor which was aborting the save file step, so I don't know what stale thing I actually uploaded in the end) - I have something I think I'm happy with. For now anyway.

- new exhaustive, iterative bin-balancing algorithm, working entirely in RGB space (I'm not really sure CIE space is that helpful when you have colour quantization in the mix - I think it can encourage false dupe colours and, complicating reduction and the image is suffering. will return to it another time)
- added a visual perception filter mask, to attenuate error computation for edges, so gradients get more attention from the allocator. the impact of this is quite significant. happy with that.
- emit some diagnostic images containing error information, the filter mask and separate/combined image fields

The perception filter makes the new reduction algorithm worthwhile. It sorts out the scan-to-scan coherence problem which makes the image look streaky because (being kind of similar to a sobel/edgedetect filter) it makes each scan slightly aware of the scans above and below, so the colours can't end up too far apart. The whole image is therefore being solved simultaneously, instead of behaving like one very long scanline. That's nice.

Without the perception filter, the new reduction algorithm actually looks worse than the LCG reduction algorithm, which is immensely simpler and cheaper to run.

The LCG is still probably the best all-rounder because it gives great results at hardly any cost. It's probably my favourite solution. The bin-balancing thing is a behemoth algorithm but it has the edge for image quality in the end. The RGB error measurements prove it.

Anyway I'm about burned out with this problem for now - going to stop fiddling with it and tidy up the prog. I'll add the y/height parameter and do another test release, then get back to refactoring for fancy display profiles.

I might even get to try a new display routine using some of evil's suggestions with any luck!


Here's the most recent render, this time with reference image and the rest...

https://dl.dropbox.com/u/12947585/TEST5.zip
Post Reply

Return to “Coding”