[CODE] Fast Sqrt rout

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

[CODE] Fast Sqrt rout

Post by tobe »

Hello,

I found this rout on the web and converted it to assembler :

Code: Select all

;-----------------------------------------------------------------------------
;  original C code by Jim Ulery :
;      static unsigned mborg_isqrt3(unsigned long val)
;      {
;        unsigned long temp, g=0, b = 0x8000, bshft = 15;
;        do {
;          if (val >= (temp = (((g<<1)+b)<<bshft--))) {
;            g += b;
;            val -= temp;
;          }
;        } while (b >>= 1);
;        return g;
;      }
;-----------------------------------------------------------------------------

fast_sqrt:
	movem.l		d1-d4,			-(sp)
	moveq		#0,				d1			; g = 0
	move.l		#$8000,			d2			; b = $8000
	moveq		#15,			d3			; bshft = 15
fast_sqrt_0:
	move.l		d1,				d4			; temp = g
	add.l		d1,				d4			; temp = g<<1
	add.l		d2,				d4			; temp = (g<<1)+b
	lsl.l		d3,				d4			; temp = ((g<<1)+b)<<bshft
	subq.l		#1,				d3			; bshft--
	cmp.l		d0,				d4			; val >= temp ?
	bhi.s		fast_sqrt_1					;	no -> fast_sqrt_1
	add.l		d2,				d1			; g += b
	sub.l		d4,				d0			; val -= temp
fast_sqrt_1:
	lsr.l		#1,				d2			; b >>= 1, b!=0 ?
	bne.s		fast_sqrt_0					;	yes -> fast_sqrt_0
	move.l		d1,				d0			; return g
	movem.l 	(sp)+,			d1-d4
	rts
I didn't tried to optimize it further, but if someone can optimize it :)

edit : i forgot, put the value in D0 and retrieve the result in D0 ;)
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Post by earx »

if you use it on 030 it should be great, but beware of the lsl.l. it is quite costly on 68000, since every bit it shifts costs 2 cycles. there are algorithms where you only need to shift 1 bit at a time.

this might be a better one: input in d1, output in d0

moveq #1,d2
ror.l #2,d2
moveq #$F,d3
.loop1: cmp.l d2,d1
bgt.s .endloop1
add.l d1,d1
lsr.l #1,d2
dbf d3,.loop1
moveq #0,d0
bra.s .is_null
.endloop1:

sub.l d2,d1
move.l d2,d0
lsr.l #1,d2
.loop2: lsr.l #1,d2
add.l d2,d0
cmp.l d0,d1
bgt.s .endloop2
sub.l d2,d0
add.l d1,d1
dbf d3,.loop2
bra.s .end
.endloop2:

sub.l d0,d1
add.l d2,d0
add.l d1,d1
dbf d3,.loop2

.end: add.l d0,d0
addi.l #$00008000,d0
.is_null:

happy coding
[ProToS]
Moderator
Moderator
Posts: 2242
Joined: Fri Sep 20, 2002 2:09 am
Location: Lourdes / France
Contact:

Post by [ProToS] »

from Stratos issue 1 (french mag)
(sorry the article was i french and I wasn't a coder ... so I only reworte all and no translation)

perhaps somebody could find this usefull ...

Code: Select all

----------cut---------------------------------------

- deux versions - rapide ou précise,
- auccune boucle,
- une (ou deux) divisions,
- 11 (ou 15) instructions,
- 2.5% (ou 0%) d'erreur moyenne.

------------cut-------------------------------------

Voyons d'abord la version la plus rapide. on
entre une valeur en d0.l, et on en sort sa racine
par d0.w. on utilise également d1 et d2; L'erreur
moyenne est de 2.5% avec un maximum de 6% si le bit
de poids le plus fort de l'entrée est impair.

fastsqrt1:	
		moveq	#31,d2
		bfffo	d0{0:31},d1
		sub.l	d1,d2		;y
		lsr.w	#1,d2		;w
		bcc.s	.round0		;arrondi
		addq.w	#1,d2
.round0:	
		move.l	d0,d1
		lsr.l	d2,d1		;z0
		divu.l	d1,d0		;itération
		add.l	d1,d0
		lsr.l	#1,d0
		rts
		
La version plus précise demande une itération 
supplémentaire. L'utilisation est identique, mais
on utilise en plus d3. L'erreur moyenne est alors
mesurable, sauf pour $7FFFFFFF où il y a une erreur
de 0.2%.

fastsqrt2:	
		moveq.l	#31,d2
		move.l	d0,d3
		bfffo	d3{0:31},d1
		sub.l	d1,d2
		lsr.w	#1,d2
		bcc.s	.round0
		addq.w	#1,d2
.round0:	
		lsr.l	d2,d1		;ordre 0
		divu.l	d1,d3		;ordre 1
		add.l	d3,d1
		lsr.l	#1,d1
		divu.l	d1,d0		;ordre 2
		add.l	d1,d0
		lsr.l	#1,d0
		rts
		
Conclusion

Voilà deux petites routines qui, je l'espère, vous
rendront de fiers services. Elles ont été réellement
testées et sont utilisées pour faire des FFT(transformé
rapide de Fourrier, qui sera aussi le sujet d'un prochain 
article) avec un simple 68030 (sans dsp!). 
SeeU
[ProToS]/Facebook
User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Post by tobe »

Yes, my code is for 68000 :/

But i only use it to prepare a small table.

Thanks for all this routs ! Maybe someone should test and compare them !

Maybe there's some more interresting ones here :
http://www.azillionmonkeys.com/qed/sqroot.html
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Small SQRT

Post by Nyh »

Well, this SQRT routine is not very fast. But it is very small:

; Call: D0 long int
; Return: SQRT(D0)
;
moveq #-1,d1
loop:
addq.l #2,d1
sub.l d1,d0
bcc.s loop
lsr.l #1,d1
move.l d1,d0
rts

Hans Wessels
User avatar
GT Turbo
Captain Atari
Captain Atari
Posts: 336
Joined: Tue Feb 17, 2004 9:41 am
Location: Alsace, France
Contact:

Post by GT Turbo »

Very short Nyh !!



GT Turbo :wink:
Cerebral Vortex : we do what we can !!
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Post by earx »

yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )
User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Post by tobe »

earx wrote:yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )
Yes :)
A short one is nice too !

I'm going to test this routs today.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Post by tobe »

Tests were done with a STE 1mb.
The routs posted by Protos use 68020+ instructions so i didn't test them.

Earx's rout is the faster, and the more accurate !

I did some modifications to make it works in the same manner than the others routs (in/out in d0, result in low word).

modified rout :

Code: Select all

fast_sqrt:
	moveq		#1,		d2
	ror.l		#2,		d2
	moveq		#$F,		d3
.loop1:
	cmp.l		d2,		d0
	bgt.s		.endloop1
	add.l		d0,		d0
	lsr.l		#1,		d2
	dbf		d3,.loop1
	moveq		#0,		d1
	bra.s		.is_null
.endloop1:
	sub.l		d2,		d0
	move.l		d2,		d1
	lsr.l		#1,		d2
.loop2:
	lsr.l		#1,		d2
	add.l		d2,		d1
	cmp.l		d1,		d0
	bgt.s		.endloop2
	sub.l		d2,		d1
	add.l		d0,		d0
	dbf		d3,.loop2
	bra.s		.end
.endloop2:
	sub.l		d1,		d0
	add.l		d2,		d1
	add.l		d0,		d0
	dbf		d3,.loop2
.end:
	add.l		d1,		d1
	addi.l		#$00008000,	d1
.is_null:
	swap		d1
	move.w		d1,		d0
	rts
Routs size :

Tobe : 36 bytes
Earx modified : 70 bytes
Nyh : 14 bytes

Comparative timings :

Tobe : 37820
Earx modified : 29241
Nyh : 93123

Source code :
You do not have the required permissions to view the files attached to this post.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Post by Nyh »

earx wrote:yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )
I will dig up my optimised routine this weekend (hope my Atari HD still works...). Let us see wether yours is still the fastest then. :-)

I have been away a long time. Wanted to do some coding again. C programming under Linux is not as much fun as some nice 68k assembly on the Atari. I was very happy with Steem to able run Pure C. Lots of improvement since I tried Steem some years ago.

Maybe I will finish some demo ideas I had some years ago.

Cu, Hans
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Post by earx »

MrNi: I dunno whether mine is the fastest. The one you showed at siliconvention seemed pretty good in the respect that it could also do sub integer, whereas mine only does a round. Would be nice to take a look at that one. :-)

Tobe: if you don't care about diskspace and do care about precalc time, it might even be a good idea to precalc the sqrt table using gfa.. ..and all this code boosting is really not required. But who cares, right ;-)
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Post by Nyh »

earx wrote:MrNi: I dunno whether mine is the fastest. The one you showed at siliconvention seemed pretty good in the respect that it could also do sub integer, whereas mine only does a round. Would be nice to take a look at that one. :-)
Earx: there are some problems with your code.
sqrt(1) = 0??
and sqrt($77654321) gives also quite a wrong answer. :-(

I have converted the Stratos code to 68k asemly. I don't get their first guess. It is quite wrong IMHO. Also the number of iterations is a bit optimistic.

Code: Select all

sqrt_strt:
     cmpi.w    #2,d0       ; >2? dan wat doen, anders direct klaar
     bcc.s     .cont
     rts
.cont:
     move.l    d2,-(sp)
     moveq     #32,d2      ; bitcount
     move.l    d0,d1
     bmi.s     .start_l
.loop:
     subq.w    #1,d2       ;
     add.l     d1,d1       ; find bit
     bpl.s     .loop
.start_l:     
     lsr.w     #1,d2
     move.l    d0,d1       ; i
     lsr.l     d2,d1       ; tmp = i>>(bits/2) order 0
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 1
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 2
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 3
     move.l    d1,d0
     move.l    (sp)+,d2
     rts
You can still do a lot of optimising. For numbers <2^30 you can use word aritmethic for the iterations like this:

Code: Select all

     move.l    d0,d3       ; i
     divu      d1,d3       ; i/tmp
     add.w     d3,d1       ; tmp+=i/tmp
     lsr.w     #1,d1       ; tmp/=2            order 1
You can make the bitcount a lot more efficient. Use for the first guess a 256 byte lookup table and you will about double the speed. Is there any need for this?
User avatar
Shazz
Atari Super Hero
Atari Super Hero
Posts: 641
Joined: Wed Aug 27, 2003 9:27 am
Location: Montréal, Canada

Post by Shazz »

Eh eh, I was sure that a day my Atari fellows will do this kind of compo (By the way I won't try to beat Earx's TC line routine :D )

Tobe, as your goal is to build a lookup table you can have a look to our amy cousins : http://membres.lycos.fr/amycoders/compo/sqrtcompo.html
The competitions aim was to optimize a routine calculating a 64k sqrt-tab by size. The routine with the least bytes used was supposed to win.

Rules:

* Only 68020++ integer asm-commands.
* 65536 one byte values have to be calculated (64kbyte).
* The given range is 0..65535.
* The tables destination is a label called "sqrttab" in a bss-area.
* All registers are in an undefined state, when the routine is called.
* The routine has to be finished with "RTS", which is not accounted to the overall routine-length.
* All fraction parts have to be cut off (no rounding) (2.7 gets 2 then for example).
I did not check if all the routs use non compatible 68000 instructions...
...8bits are enough...
User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Post by tobe »

I'm not trying to build an sqrt lookup table, but a table from the results of calls to sqrt. (and a table of 65536 words is quite big for my needs ;))

This is my first screen coded only with 68k assembler, and i'm trying to optimize both size and speed, for the fun !

Now i need to test this rout from Nyh.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

SQRT the fastest

Post by Nyh »

Hi,

I dug up my SQRT routines and played aroud with the Newton approximations from the Stratos magazine. I got some interesting results.

There is a slight problem in the Earx routine. sqrt(1) is calculated with the wrong answer and so is sqrt(2003125025). The routine doesn't like numbers above 2^31. The routine needs some fixing.

First the numbers:

Code: Select all

Calculating square roots 0 to 10 000 000 on a 8 MHz ST

Fast Newton ASM  773 sec
Earx ASM        1544 sec
Nyh ASM         1555 sec
Nyh C           1575 sec
Fast newton C   1676 sec
tobe ASM        2133 sec
Newton C        2731 sec
float C         3290 sec
Except for the Earx routine all other routines produce correct answers for all numbers from 0 to 2^32-1. (I tested Fast Newton ASM, Nyh ASM, Nyh C, Fast Newton C and Newton C for correctness over the whole range.)

The Fast Newton ASM is by far the fastest. It is also the biggest one. It uses the Newton approximation for a SQRT: x_n+1 = 0.5*(x_n + Value/x_n). You get a better answer very fast. Every iteration the number of digits doubles. If your first gues is correct for 1 bit, the first iteration gives you 2, the 2nd 4, the 3rd 8 and the 4th 16 bits. There we stop because from a 32 bit number you can only get 16 a 16 bit integer as sqrt.

A disadvantage from the Newton method is that it is an aproximation. The last digit may be rounded up or down. The Nyh ASM routine always rounds down, so does tobe. The Nyh, Tobe and Earx routine are basicly the same method, Only the implementation is slightly different.

Well let us cut the crap and look at the code. First the Nyh ASM. It comes from the I_A addition lib. The code in the lib has a good explanation of the algorithm. I haev added it to the zipfile.

Code: Select all

sqrt_nyh1:
                move.l  d3,a0
                move.l  #$40000000,d1   ; mask
                moveq   #0,d2           ; result=0
loop:
                move.l  d2,d3           ; tmp=result
                add.l   d1,d3           ; tmp+=mask
                lsr.l   #1,d2           ; result>>=1
                cmp.l   d3,d0           ; x-tmp<0?
                bcs.s   cont            ; yep
                sub.l   d3,d0           ; x-=tmp
                add.l   d1,d2           ; result+=mask
cont:
                lsr.l   #2,d1           ; mask>>=2;
                bne.s   loop            ; nogmal
                move.l  d2,d0           ; d0=result
                move.l  a0,d3
                rts                     ; Einde
And here is the Fast Newton:

Code: Select all

; high speed sqrt
; call: DO = x, integer
; return: D0 = SQRT(x)

sqrt_strt:
     move.w  #$100,a0
     cmp.l   a0,d0        ; d0<256?
     bcs.s   .8bit        ; yep
     move.l  d0,a1        ; x
     swap    d0
     tst.w   d0
     bne.s   .32bit
     move.l  a1,d0
     moveq   #0,d1
.loop:
     addq.w  #1,d1        ;
.loop_start:              ; 16 bits left
     lsr.w   #2,d0        ;
     cmp.w   a0,d0
     bcc.s   .loop
     move.b  .sqrt_table(pc,d0.w),d0 ; table lookup
     lsl.w   d1,d0       ; order 0
     move.l  a1,d1       ; x
     divu    d0,d1       ; x/tmp
     add.w   d1,d0       ; tmp+=x/tmp
     lsr.w   #1,d0       ; tmp/=2            order 1
     move.l  a1,d1
     divu    d0,d1       ; x/tmp
     add.w   d1,d0       ; tmp+=x/tmp
     lsr.w   #1,d0       ; tmp/=2            order 2
     rts
.8bit:
     move.b  .sqrt_table(pc,d0.w),d0 ; table lookup
     rts
.32bit:
     cmp.w   a0,d0        ; 24 bit?
     bcc.s   .24bit
     move.l  a1,d0
     lsr.l   #8,d0
     moveq   #5,d1
     bra.s   .loop_start
.24bit:
     cmpi.w  #$3fc0,d0    ; bignum?
     bcc.s   .bignum      ; yep
     move.l  a1,d0
     clr.w   d0
     swap    d0
     moveq   #9,d1
     bra.s   .loop_start
.bignum:                  ; big nums cause overflows
     cmpi.w  #$fffd,d0    ; big bignum?
     bcc.s   .bigbignum   ; yep!
     move.l  a1,-(sp)     ; store x
     swap    d0
     lsr.l   #4,d0        ; x geen bignum meer
     move.l  d0,a1        ; x
     clr.w   d0
     swap    d0
     moveq   #9,d1
     bsr.s   .loop_start
     add.w   d0,d0
     add.w   d0,d0        ; d0 is our guess now
     move.l  (sp)+,d1     ; x
     divu    d0,d1        ; i/tmp
     bvc.s   .cont0
     moveq   #1,d1        ; overflow!
     bra.s   .cont1
.cont0:
     swap    d1
     clr.w   d1
.cont1:
     swap    d1
     add.l   d1,d0        ; tmp+=i/tmp
     lsr.l   #1,d0        ; tmp/=2       
     rts
.bigbignum:
     moveq   #0,d0
     subq.w  #1,d0        ; d0=$0000ffff
     rts

.sqrt_table:
dc.b 0,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3
dc.b 4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5
dc.b 5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6
dc.b 6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
dc.b 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
dc.b 8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
dc.b 9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10
dc.b 10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11
dc.b 11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11
dc.b 12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12
dc.b 12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13
dc.b 13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13
dc.b 13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14
dc.b 14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14
dc.b 14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15
dc.b 15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15
I included all .C and .S files in the zip file. Hope you like it. I love this kind of work!

Fidelus Bittus et Fuckus Clockcyclus!

Hans Wessels
You do not have the required permissions to view the files attached to this post.
Last edited by Nyh on Mon Oct 18, 2004 7:50 pm, edited 1 time in total.
User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Post by tobe »

Thanks a lot Nyh !
The Nyh ASM is the one i need, small and fast !
Great :D
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Nyh ASM still faster...

Post by Nyh »

Do you think I have tested all. Write a final report, post it, review the code. Make some optimalisations and let it run again. For the 10 000 000 square roots the new routines needs only 1374 seconds. Let it run last night to test it. Here is the code:

Code: Select all

sqrt_nyh1:
                move.l  #$40000000,d1   ; mask
.loop0:
                cmp.l   d1,d0           ; x-tmp<0?
                bcs.s   .cont0          ; yep
                move.l  d1,d2           ; result=mask
                sub.l   d1,d0           ; x-=mask
                move.l  d3,a0
                lsr.l   #2,d1           ; mask>>=2
                bne.s   .loop1
                move.l  d2,d0           ; d0=result
                rts
.cont0:
                lsr.l   #2,d1           ; mask>>=2
                bne.s   .loop0
                rts
.loop1:
                move.l  d2,d3           ; tmp=result
                add.l   d1,d3           ; tmp+=mask
                lsr.l   #1,d2           ; result>>=1
                cmp.l   d3,d0           ; x-tmp<0?
                bcs.s   .cont1          ; yep
                sub.l   d3,d0           ; x-=tmp
                add.l   d1,d2           ; result+=mask
.cont1:
                lsr.l   #2,d1           ; mask>>=2;
                bne.s   .loop1          ; nogmal
                move.l  d2,d0           ; d0=result
                move.l  a0,d3
                rts                     ; Einde
Basically a added an extra loop to handle the case where the answer is still zero to eat the leading zeros. About doubling the codesize but, alas, not halfing teh runningtime.

Cu, Hans
User avatar
ggn
Atari God
Atari God
Posts: 1295
Joined: Sat Dec 28, 2002 4:49 pm

Post by ggn »

And here's another sqrt implementation. It is quite fast, but it fails a couple of sanity tests in Nyh's test shell, but here it is just for completeness! I don't know who coded it and where I got it from though...

Code: Select all

;Calculate square root of 32 bit Number in d0.l
;ENTRY d0.l holds longword to find square root of
;EXIT  d0.w holds 16 bit square root value

square_root	tst.l	d0
	beq	square_root2	if d0=0 then d0=square root 0

	movem.l	d1-d2,-(sp)
	move	#31,d2
square_root1	btst	d2,d0
	dbne	d2,square_root1
	lsr	#1,d2
	bset	d2,d2
	move.l	d0,d1
	divs	d2,d1
	add	d1,d2
	lsr	#1,d2
	move.l	d0,d1
	divs	d2,d1
	add	d1,d2
	
	lsr	#1,d2
	move.l	d0,d1
	divs	d2,d1
	add	d1,d2
	
	lsr	#1,d2
	clr.l	d0
	move	d2,d0
	movem.l	(sp)+,d1-d2
square_root2	rts
[/code]
is 73 Falcon patched atari games enough ? ^^
User avatar
Nyh
Atari God
Atari God
Posts: 1533
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Post by Nyh »

ggn wrote:And here's another sqrt implementation. It is quite fast, but it fails a couple of sanity tests in Nyh's test shell, but here it is just for completeness! I don't know who coded it and where I got it from though...
Hi ggn,

Your code is yet an other implementation of Newtons algorithm like the code from the Stratos magazine. I like the use of the BTST and BSET instructions. I would have used a shift which is slower.

The use of the DIVS instruction makes the routine unusable for 32 bit numbers. Also the use of add.w makes that de squareroot must be smaller as $8000. That makes the upper limit for the square root 2^30. Because there are only three iterations you wil get only 8 bits correct in the answer.
Conclusion: it is not a 32 bit squareroot routine but only 16 bits. For up to 29 bits it will give a good guess, for 31 and 32 bits numbers it gives a very wrong answer.

Hans
User avatar
ggn
Atari God
Atari God
Posts: 1295
Joined: Sat Dec 28, 2002 4:49 pm

Post by ggn »

Nyh,

I have no choice but to agree on all your points because mainly I'm doing my navy service and I don't have enough time to read a lot about sqrt implementations :) As I said, I put it there only for the record, and because it was lying on my hdd.

Regards, George
is 73 Falcon patched atari games enough ? ^^
MetalGuru
Retro freak
Retro freak
Posts: 15
Joined: Thu Sep 02, 2004 9:17 am
Location: Sweden

Post by MetalGuru »

Here is another fast one....

*in d0.w 16.0 out d0.w 8.8
SQRT
movem.l d1-d3,-(sp)
move.l #$4000,d1
moveq #0,d2
moveq #0,d3

sqrt_l
add d1,d3
cmp.l d0,d3
bgt.s skip
sub.l d3,d0
move d1,d3
add d3,d3
or d3,d2

skip
add.l d0,d0
move d2,d3
lsr d1
bne.s sqrt_l
move d2,d0
movem.l (sp)+,d1-d3
rts



/MetalGuru
User avatar
IllegalException
Atariator
Atariator
Posts: 27
Joined: Thu Jan 19, 2006 2:33 pm

Post by IllegalException »

Nice going MetalGuru!

Keep up the SOTE-flag! ;)

By the way, do you have access to Donald's Human Race IV-version? Does it emulate nicely in STeem?

Cheers!
Illegal Exception
Scum Of The Earth (S.O.T.E.)
MetalGuru
Retro freak
Retro freak
Posts: 15
Joined: Thu Sep 02, 2004 9:17 am
Location: Sweden

Post by MetalGuru »

Download "The 20'th Anniversary Demo" released december 2005 and look for the hidden screen.....


Mvh MetalGuru/SOTE
User avatar
IllegalException
Atariator
Atariator
Posts: 27
Joined: Thu Jan 19, 2006 2:33 pm

Post by IllegalException »

Cool...will do!

Still at work?
Illegal Exception
Scum Of The Earth (S.O.T.E.)
User avatar
gloky
Captain Atari
Captain Atari
Posts: 203
Joined: Tue Dec 10, 2002 8:24 pm
Location: berck plage

Post by gloky »

hello
to calculate sqrt i don't know how to, but to do a table of sqrt of 64kb
there is a simple methode

let's analyse a sqrt table:

x: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
sqrt 0 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 4
in fact there are 1x'0', 3x '1', 5x'2', 7x'3', 9 x '4', etc
so the algo is

tl=1
nb=0 ;value of sqrt x
off=0 ; x

do
for t=off to off+tl do tab[off]=nb
off=off+tl
tl=tl+2
nb=nb+1
until offset is 64k

etonning, not ? :)

next t
;
; gloky 2009
;

zyprexa 10, 1 matin 1 soir
tercian 1 matin 1 midi 1 soir 1 couché
vitamine b1b6 2, 2, 0,0
cymbalta 1 le matin
equanil 400 2 matin 2 midi 1 soir 1 couché
Post Reply

Return to “680x0”