[CODE] Fast Sqrt rout

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

[CODE] Fast Sqrt rout

Postby tobe » Tue Oct 12, 2004 4:53 pm

Hello,

I found this rout on the web and converted it to assembler :

Code: Select all

;-----------------------------------------------------------------------------
;  original C code by Jim Ulery :
;      static unsigned mborg_isqrt3(unsigned long val)
;      {
;        unsigned long temp, g=0, b = 0x8000, bshft = 15;
;        do {
;          if (val >= (temp = (((g<<1)+b)<<bshft--))) {
;            g += b;
;            val -= temp;
;          }
;        } while (b >>= 1);
;        return g;
;      }
;-----------------------------------------------------------------------------

fast_sqrt:
   movem.l      d1-d4,         -(sp)
   moveq      #0,            d1         ; g = 0
   move.l      #$8000,         d2         ; b = $8000
   moveq      #15,         d3         ; bshft = 15
fast_sqrt_0:
   move.l      d1,            d4         ; temp = g
   add.l      d1,            d4         ; temp = g<<1
   add.l      d2,            d4         ; temp = (g<<1)+b
   lsl.l      d3,            d4         ; temp = ((g<<1)+b)<<bshft
   subq.l      #1,            d3         ; bshft--
   cmp.l      d0,            d4         ; val >= temp ?
   bhi.s      fast_sqrt_1               ;   no -> fast_sqrt_1
   add.l      d2,            d1         ; g += b
   sub.l      d4,            d0         ; val -= temp
fast_sqrt_1:
   lsr.l      #1,            d2         ; b >>= 1, b!=0 ?
   bne.s      fast_sqrt_0               ;   yes -> fast_sqrt_0
   move.l      d1,            d0         ; return g
   movem.l    (sp)+,         d1-d4
   rts


I didn't tried to optimize it further, but if someone can optimize it :)

edit : i forgot, put the value in D0 and retrieve the result in D0 ;)
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Postby earx » Tue Oct 12, 2004 5:15 pm

if you use it on 030 it should be great, but beware of the lsl.l. it is quite costly on 68000, since every bit it shifts costs 2 cycles. there are algorithms where you only need to shift 1 bit at a time.

this might be a better one: input in d1, output in d0

moveq #1,d2
ror.l #2,d2
moveq #$F,d3
.loop1: cmp.l d2,d1
bgt.s .endloop1
add.l d1,d1
lsr.l #1,d2
dbf d3,.loop1
moveq #0,d0
bra.s .is_null
.endloop1:

sub.l d2,d1
move.l d2,d0
lsr.l #1,d2
.loop2: lsr.l #1,d2
add.l d2,d0
cmp.l d0,d1
bgt.s .endloop2
sub.l d2,d0
add.l d1,d1
dbf d3,.loop2
bra.s .end
.endloop2:

sub.l d0,d1
add.l d2,d0
add.l d1,d1
dbf d3,.loop2

.end: add.l d0,d0
addi.l #$00008000,d0
.is_null:

happy coding

User avatar
[ProToS]
Moderator
Moderator
Posts: 2242
Joined: Fri Sep 20, 2002 2:09 am
Location: Lourdes / France
Contact:

Postby [ProToS] » Tue Oct 12, 2004 10:05 pm

from Stratos issue 1 (french mag)
(sorry the article was i french and I wasn't a coder ... so I only reworte all and no translation)

perhaps somebody could find this usefull ...

Code: Select all

----------cut---------------------------------------

- deux versions - rapide ou précise,
- auccune boucle,
- une (ou deux) divisions,
- 11 (ou 15) instructions,
- 2.5% (ou 0%) d'erreur moyenne.

------------cut-------------------------------------

Voyons d'abord la version la plus rapide. on
entre une valeur en d0.l, et on en sort sa racine
par d0.w. on utilise également d1 et d2; L'erreur
moyenne est de 2.5% avec un maximum de 6% si le bit
de poids le plus fort de l'entrée est impair.

fastsqrt1:   
      moveq   #31,d2
      bfffo   d0{0:31},d1
      sub.l   d1,d2      ;y
      lsr.w   #1,d2      ;w
      bcc.s   .round0      ;arrondi
      addq.w   #1,d2
.round0:   
      move.l   d0,d1
      lsr.l   d2,d1      ;z0
      divu.l   d1,d0      ;itération
      add.l   d1,d0
      lsr.l   #1,d0
      rts
      
La version plus précise demande une itération
supplémentaire. L'utilisation est identique, mais
on utilise en plus d3. L'erreur moyenne est alors
mesurable, sauf pour $7FFFFFFF où il y a une erreur
de 0.2%.

fastsqrt2:   
      moveq.l   #31,d2
      move.l   d0,d3
      bfffo   d3{0:31},d1
      sub.l   d1,d2
      lsr.w   #1,d2
      bcc.s   .round0
      addq.w   #1,d2
.round0:   
      lsr.l   d2,d1      ;ordre 0
      divu.l   d1,d3      ;ordre 1
      add.l   d3,d1
      lsr.l   #1,d1
      divu.l   d1,d0      ;ordre 2
      add.l   d1,d0
      lsr.l   #1,d0
      rts
      
Conclusion

Voilà deux petites routines qui, je l'espère, vous
rendront de fiers services. Elles ont été réellement
testées et sont utilisées pour faire des FFT(transformé
rapide de Fourrier, qui sera aussi le sujet d'un prochain
article) avec un simple 68030 (sans dsp!).
SeeU
[ProToS]/Facebook

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Wed Oct 13, 2004 11:56 am

Yes, my code is for 68000 :/

But i only use it to prepare a small table.

Thanks for all this routs ! Maybe someone should test and compare them !

Maybe there's some more interresting ones here :
http://www.azillionmonkeys.com/qed/sqroot.html
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Small SQRT

Postby Nyh » Wed Oct 13, 2004 6:59 pm

Well, this SQRT routine is not very fast. But it is very small:

; Call: D0 long int
; Return: SQRT(D0)
;
moveq #-1,d1
loop:
addq.l #2,d1
sub.l d1,d0
bcc.s loop
lsr.l #1,d1
move.l d1,d0
rts

Hans Wessels

User avatar
GT Turbo
Captain Atari
Captain Atari
Posts: 335
Joined: Tue Feb 17, 2004 9:41 am
Location: Alsace, France
Contact:

Postby GT Turbo » Wed Oct 13, 2004 7:04 pm

Very short Nyh !!



GT Turbo :wink:
Never forget : Power is in your minds !!!

http://Cerebral-Vortex.net

http://Jagware.org

User avatar
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Postby earx » Thu Oct 14, 2004 11:28 am

yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Thu Oct 14, 2004 11:42 am

earx wrote:yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )

Yes :)
A short one is nice too !

I'm going to test this routs today.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Thu Oct 14, 2004 5:27 pm

Tests were done with a STE 1mb.
The routs posted by Protos use 68020+ instructions so i didn't test them.

Earx's rout is the faster, and the more accurate !

I did some modifications to make it works in the same manner than the others routs (in/out in d0, result in low word).

modified rout :

Code: Select all

fast_sqrt:
   moveq      #1,      d2
   ror.l      #2,      d2
   moveq      #$F,      d3
.loop1:
   cmp.l      d2,      d0
   bgt.s      .endloop1
   add.l      d0,      d0
   lsr.l      #1,      d2
   dbf      d3,.loop1
   moveq      #0,      d1
   bra.s      .is_null
.endloop1:
   sub.l      d2,      d0
   move.l      d2,      d1
   lsr.l      #1,      d2
.loop2:
   lsr.l      #1,      d2
   add.l      d2,      d1
   cmp.l      d1,      d0
   bgt.s      .endloop2
   sub.l      d2,      d1
   add.l      d0,      d0
   dbf      d3,.loop2
   bra.s      .end
.endloop2:
   sub.l      d1,      d0
   add.l      d2,      d1
   add.l      d0,      d0
   dbf      d3,.loop2
.end:
   add.l      d1,      d1
   addi.l      #$00008000,   d1
.is_null:
   swap      d1
   move.w      d1,      d0
   rts


Routs size :

Tobe : 36 bytes
Earx modified : 70 bytes
Nyh : 14 bytes

Comparative timings :

Tobe : 37820
Earx modified : 29241
Nyh : 93123

Source code :
You do not have the required permissions to view the files attached to this post.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Postby Nyh » Thu Oct 14, 2004 9:25 pm

earx wrote:yep, mine is fastest, but mrni's is shortest ;-) (welcome back MrNI! :-) )


I will dig up my optimised routine this weekend (hope my Atari HD still works...). Let us see wether yours is still the fastest then. :-)

I have been away a long time. Wanted to do some coding again. C programming under Linux is not as much fun as some nice 68k assembly on the Atari. I was very happy with Steem to able run Pure C. Lots of improvement since I tried Steem some years ago.

Maybe I will finish some demo ideas I had some years ago.

Cu, Hans

User avatar
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Postby earx » Fri Oct 15, 2004 2:29 pm

MrNi: I dunno whether mine is the fastest. The one you showed at siliconvention seemed pretty good in the respect that it could also do sub integer, whereas mine only does a round. Would be nice to take a look at that one. :-)

Tobe: if you don't care about diskspace and do care about precalc time, it might even be a good idea to precalc the sqrt table using gfa.. ..and all this code boosting is really not required. But who cares, right ;-)

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Postby Nyh » Fri Oct 15, 2004 10:55 pm

earx wrote:MrNi: I dunno whether mine is the fastest. The one you showed at siliconvention seemed pretty good in the respect that it could also do sub integer, whereas mine only does a round. Would be nice to take a look at that one. :-)


Earx: there are some problems with your code.
sqrt(1) = 0??
and sqrt($77654321) gives also quite a wrong answer. :-(

I have converted the Stratos code to 68k asemly. I don't get their first guess. It is quite wrong IMHO. Also the number of iterations is a bit optimistic.

Code: Select all

sqrt_strt:
     cmpi.w    #2,d0       ; >2? dan wat doen, anders direct klaar
     bcc.s     .cont
     rts
.cont:
     move.l    d2,-(sp)
     moveq     #32,d2      ; bitcount
     move.l    d0,d1
     bmi.s     .start_l
.loop:
     subq.w    #1,d2       ;
     add.l     d1,d1       ; find bit
     bpl.s     .loop
.start_l:     
     lsr.w     #1,d2
     move.l    d0,d1       ; i
     lsr.l     d2,d1       ; tmp = i>>(bits/2) order 0
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 1
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 2
     move.l    d0,d2       ; i
     divu      d1,d2       ; i/tmp
     swap      d2
     clr.w     d2
     swap      d2
     add.l     d2,d1       ; tmp+=i/tmp
     lsr.l     #1,d1       ; tmp/=2            order 3
     move.l    d1,d0
     move.l    (sp)+,d2
     rts


You can still do a lot of optimising. For numbers <2^30 you can use word aritmethic for the iterations like this:

Code: Select all

     move.l    d0,d3       ; i
     divu      d1,d3       ; i/tmp
     add.w     d3,d1       ; tmp+=i/tmp
     lsr.w     #1,d1       ; tmp/=2            order 1

You can make the bitcount a lot more efficient. Use for the first guess a 256 byte lookup table and you will about double the speed. Is there any need for this?

User avatar
Shazz
Atari Super Hero
Atari Super Hero
Posts: 576
Joined: Wed Aug 27, 2003 9:27 am
Location: Crétin des Alpes dauphinoises

Postby Shazz » Sun Oct 17, 2004 1:10 pm

Eh eh, I was sure that a day my Atari fellows will do this kind of compo (By the way I won't try to beat Earx's TC line routine :D )

Tobe, as your goal is to build a lookup table you can have a look to our amy cousins : http://membres.lycos.fr/amycoders/compo/sqrtcompo.html

The competitions aim was to optimize a routine calculating a 64k sqrt-tab by size. The routine with the least bytes used was supposed to win.

Rules:

* Only 68020++ integer asm-commands.
* 65536 one byte values have to be calculated (64kbyte).
* The given range is 0..65535.
* The tables destination is a label called "sqrttab" in a bss-area.
* All registers are in an undefined state, when the routine is called.
* The routine has to be finished with "RTS", which is not accounted to the overall routine-length.
* All fraction parts have to be cut off (no rounding) (2.7 gets 2 then for example).


I did not check if all the routs use non compatible 68000 instructions...
...8bits are enough...

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Sun Oct 17, 2004 8:21 pm

I'm not trying to build an sqrt lookup table, but a table from the results of calls to sqrt. (and a table of 65536 words is quite big for my needs ;))

This is my first screen coded only with 68k assembler, and i'm trying to optimize both size and speed, for the fun !

Now i need to test this rout from Nyh.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

SQRT the fastest

Postby Nyh » Sun Oct 17, 2004 8:22 pm

Hi,

I dug up my SQRT routines and played aroud with the Newton approximations from the Stratos magazine. I got some interesting results.

There is a slight problem in the Earx routine. sqrt(1) is calculated with the wrong answer and so is sqrt(2003125025). The routine doesn't like numbers above 2^31. The routine needs some fixing.

First the numbers:

Code: Select all

Calculating square roots 0 to 10 000 000 on a 8 MHz ST

Fast Newton ASM  773 sec
Earx ASM        1544 sec
Nyh ASM         1555 sec
Nyh C           1575 sec
Fast newton C   1676 sec
tobe ASM        2133 sec
Newton C        2731 sec
float C         3290 sec

Except for the Earx routine all other routines produce correct answers for all numbers from 0 to 2^32-1. (I tested Fast Newton ASM, Nyh ASM, Nyh C, Fast Newton C and Newton C for correctness over the whole range.)

The Fast Newton ASM is by far the fastest. It is also the biggest one. It uses the Newton approximation for a SQRT: x_n+1 = 0.5*(x_n + Value/x_n). You get a better answer very fast. Every iteration the number of digits doubles. If your first gues is correct for 1 bit, the first iteration gives you 2, the 2nd 4, the 3rd 8 and the 4th 16 bits. There we stop because from a 32 bit number you can only get 16 a 16 bit integer as sqrt.

A disadvantage from the Newton method is that it is an aproximation. The last digit may be rounded up or down. The Nyh ASM routine always rounds down, so does tobe. The Nyh, Tobe and Earx routine are basicly the same method, Only the implementation is slightly different.

Well let us cut the crap and look at the code. First the Nyh ASM. It comes from the I_A addition lib. The code in the lib has a good explanation of the algorithm. I haev added it to the zipfile.

Code: Select all

sqrt_nyh1:
                move.l  d3,a0
                move.l  #$40000000,d1   ; mask
                moveq   #0,d2           ; result=0
loop:
                move.l  d2,d3           ; tmp=result
                add.l   d1,d3           ; tmp+=mask
                lsr.l   #1,d2           ; result>>=1
                cmp.l   d3,d0           ; x-tmp<0?
                bcs.s   cont            ; yep
                sub.l   d3,d0           ; x-=tmp
                add.l   d1,d2           ; result+=mask
cont:
                lsr.l   #2,d1           ; mask>>=2;
                bne.s   loop            ; nogmal
                move.l  d2,d0           ; d0=result
                move.l  a0,d3
                rts                     ; Einde


And here is the Fast Newton:

Code: Select all

; high speed sqrt
; call: DO = x, integer
; return: D0 = SQRT(x)

sqrt_strt:
     move.w  #$100,a0
     cmp.l   a0,d0        ; d0<256?
     bcs.s   .8bit        ; yep
     move.l  d0,a1        ; x
     swap    d0
     tst.w   d0
     bne.s   .32bit
     move.l  a1,d0
     moveq   #0,d1
.loop:
     addq.w  #1,d1        ;
.loop_start:              ; 16 bits left
     lsr.w   #2,d0        ;
     cmp.w   a0,d0
     bcc.s   .loop
     move.b  .sqrt_table(pc,d0.w),d0 ; table lookup
     lsl.w   d1,d0       ; order 0
     move.l  a1,d1       ; x
     divu    d0,d1       ; x/tmp
     add.w   d1,d0       ; tmp+=x/tmp
     lsr.w   #1,d0       ; tmp/=2            order 1
     move.l  a1,d1
     divu    d0,d1       ; x/tmp
     add.w   d1,d0       ; tmp+=x/tmp
     lsr.w   #1,d0       ; tmp/=2            order 2
     rts
.8bit:
     move.b  .sqrt_table(pc,d0.w),d0 ; table lookup
     rts
.32bit:
     cmp.w   a0,d0        ; 24 bit?
     bcc.s   .24bit
     move.l  a1,d0
     lsr.l   #8,d0
     moveq   #5,d1
     bra.s   .loop_start
.24bit:
     cmpi.w  #$3fc0,d0    ; bignum?
     bcc.s   .bignum      ; yep
     move.l  a1,d0
     clr.w   d0
     swap    d0
     moveq   #9,d1
     bra.s   .loop_start
.bignum:                  ; big nums cause overflows
     cmpi.w  #$fffd,d0    ; big bignum?
     bcc.s   .bigbignum   ; yep!
     move.l  a1,-(sp)     ; store x
     swap    d0
     lsr.l   #4,d0        ; x geen bignum meer
     move.l  d0,a1        ; x
     clr.w   d0
     swap    d0
     moveq   #9,d1
     bsr.s   .loop_start
     add.w   d0,d0
     add.w   d0,d0        ; d0 is our guess now
     move.l  (sp)+,d1     ; x
     divu    d0,d1        ; i/tmp
     bvc.s   .cont0
     moveq   #1,d1        ; overflow!
     bra.s   .cont1
.cont0:
     swap    d1
     clr.w   d1
.cont1:
     swap    d1
     add.l   d1,d0        ; tmp+=i/tmp
     lsr.l   #1,d0        ; tmp/=2       
     rts
.bigbignum:
     moveq   #0,d0
     subq.w  #1,d0        ; d0=$0000ffff
     rts

.sqrt_table:
dc.b 0,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3
dc.b 4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5
dc.b 5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6
dc.b 6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
dc.b 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
dc.b 8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
dc.b 9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10
dc.b 10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11
dc.b 11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11
dc.b 12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12
dc.b 12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13
dc.b 13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13
dc.b 13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14
dc.b 14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14
dc.b 14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15
dc.b 15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15


I included all .C and .S files in the zip file. Hope you like it. I love this kind of work!

Fidelus Bittus et Fuckus Clockcyclus!

Hans Wessels
You do not have the required permissions to view the files attached to this post.
Last edited by Nyh on Mon Oct 18, 2004 7:50 pm, edited 1 time in total.

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Sun Oct 17, 2004 8:26 pm

Thanks a lot Nyh !
The Nyh ASM is the one i need, small and fast !
Great :D
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Nyh ASM still faster...

Postby Nyh » Mon Oct 18, 2004 7:36 am

Do you think I have tested all. Write a final report, post it, review the code. Make some optimalisations and let it run again. For the 10 000 000 square roots the new routines needs only 1374 seconds. Let it run last night to test it. Here is the code:

Code: Select all

sqrt_nyh1:
                move.l  #$40000000,d1   ; mask
.loop0:
                cmp.l   d1,d0           ; x-tmp<0?
                bcs.s   .cont0          ; yep
                move.l  d1,d2           ; result=mask
                sub.l   d1,d0           ; x-=mask
                move.l  d3,a0
                lsr.l   #2,d1           ; mask>>=2
                bne.s   .loop1
                move.l  d2,d0           ; d0=result
                rts
.cont0:
                lsr.l   #2,d1           ; mask>>=2
                bne.s   .loop0
                rts
.loop1:
                move.l  d2,d3           ; tmp=result
                add.l   d1,d3           ; tmp+=mask
                lsr.l   #1,d2           ; result>>=1
                cmp.l   d3,d0           ; x-tmp<0?
                bcs.s   .cont1          ; yep
                sub.l   d3,d0           ; x-=tmp
                add.l   d1,d2           ; result+=mask
.cont1:
                lsr.l   #2,d1           ; mask>>=2;
                bne.s   .loop1          ; nogmal
                move.l  d2,d0           ; d0=result
                move.l  a0,d3
                rts                     ; Einde

Basically a added an extra loop to handle the case where the answer is still zero to eat the leading zeros. About doubling the codesize but, alas, not halfing teh runningtime.

Cu, Hans

User avatar
ggn
Atari God
Atari God
Posts: 1258
Joined: Sat Dec 28, 2002 4:49 pm

Postby ggn » Tue Oct 26, 2004 1:57 pm

And here's another sqrt implementation. It is quite fast, but it fails a couple of sanity tests in Nyh's test shell, but here it is just for completeness! I don't know who coded it and where I got it from though...

Code: Select all

;Calculate square root of 32 bit Number in d0.l
;ENTRY d0.l holds longword to find square root of
;EXIT  d0.w holds 16 bit square root value

square_root   tst.l   d0
   beq   square_root2   if d0=0 then d0=square root 0

   movem.l   d1-d2,-(sp)
   move   #31,d2
square_root1   btst   d2,d0
   dbne   d2,square_root1
   lsr   #1,d2
   bset   d2,d2
   move.l   d0,d1
   divs   d2,d1
   add   d1,d2
   lsr   #1,d2
   move.l   d0,d1
   divs   d2,d1
   add   d1,d2
   
   lsr   #1,d2
   move.l   d0,d1
   divs   d2,d1
   add   d1,d2
   
   lsr   #1,d2
   clr.l   d0
   move   d2,d0
   movem.l   (sp)+,d1-d2
square_root2   rts

[/code]
is 73 Falcon patched atari games enough ? ^^

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Postby Nyh » Tue Oct 26, 2004 8:28 pm

ggn wrote:And here's another sqrt implementation. It is quite fast, but it fails a couple of sanity tests in Nyh's test shell, but here it is just for completeness! I don't know who coded it and where I got it from though...


Hi ggn,

Your code is yet an other implementation of Newtons algorithm like the code from the Stratos magazine. I like the use of the BTST and BSET instructions. I would have used a shift which is slower.

The use of the DIVS instruction makes the routine unusable for 32 bit numbers. Also the use of add.w makes that de squareroot must be smaller as $8000. That makes the upper limit for the square root 2^30. Because there are only three iterations you wil get only 8 bits correct in the answer.
Conclusion: it is not a 32 bit squareroot routine but only 16 bits. For up to 29 bits it will give a good guess, for 31 and 32 bits numbers it gives a very wrong answer.

Hans

User avatar
ggn
Atari God
Atari God
Posts: 1258
Joined: Sat Dec 28, 2002 4:49 pm

Postby ggn » Thu Oct 28, 2004 12:58 pm

Nyh,

I have no choice but to agree on all your points because mainly I'm doing my navy service and I don't have enough time to read a lot about sqrt implementations :) As I said, I put it there only for the record, and because it was lying on my hdd.

Regards, George
is 73 Falcon patched atari games enough ? ^^

MetalGuru
Retro freak
Retro freak
Posts: 15
Joined: Thu Sep 02, 2004 9:17 am
Location: Sweden

Postby MetalGuru » Wed May 18, 2005 11:55 am

Here is another fast one....

*in d0.w 16.0 out d0.w 8.8
SQRT
movem.l d1-d3,-(sp)
move.l #$4000,d1
moveq #0,d2
moveq #0,d3

sqrt_l
add d1,d3
cmp.l d0,d3
bgt.s skip
sub.l d3,d0
move d1,d3
add d3,d3
or d3,d2

skip
add.l d0,d0
move d2,d3
lsr d1
bne.s sqrt_l
move d2,d0
movem.l (sp)+,d1-d3
rts



/MetalGuru

User avatar
IllegalException
Atariator
Atariator
Posts: 27
Joined: Thu Jan 19, 2006 2:33 pm

Postby IllegalException » Fri Jan 20, 2006 12:02 pm

Nice going MetalGuru!

Keep up the SOTE-flag! ;)

By the way, do you have access to Donald's Human Race IV-version? Does it emulate nicely in STeem?

Cheers!
Illegal Exception
Scum Of The Earth (S.O.T.E.)

MetalGuru
Retro freak
Retro freak
Posts: 15
Joined: Thu Sep 02, 2004 9:17 am
Location: Sweden

Postby MetalGuru » Fri Jan 20, 2006 2:01 pm

Download "The 20'th Anniversary Demo" released december 2005 and look for the hidden screen.....


Mvh MetalGuru/SOTE

User avatar
IllegalException
Atariator
Atariator
Posts: 27
Joined: Thu Jan 19, 2006 2:33 pm

Postby IllegalException » Fri Jan 20, 2006 4:49 pm

Cool...will do!

Still at work?
Illegal Exception

Scum Of The Earth (S.O.T.E.)

User avatar
gloky
Captain Atari
Captain Atari
Posts: 203
Joined: Tue Dec 10, 2002 8:24 pm
Location: berck plage

Postby gloky » Wed Mar 29, 2006 10:57 pm

hello
to calculate sqrt i don't know how to, but to do a table of sqrt of 64kb
there is a simple methode

let's analyse a sqrt table:

x: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
sqrt 0 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 4
in fact there are 1x'0', 3x '1', 5x'2', 7x'3', 9 x '4', etc
so the algo is

tl=1
nb=0 ;value of sqrt x
off=0 ; x

do
for t=off to off+tl do tab[off]=nb
off=off+tl
tl=tl+2
nb=nb+1
until offset is 64k

etonning, not ? :)

next t
;
; gloky 2009
;

zyprexa 10, 1 matin 1 soir
tercian 1 matin 1 midi 1 soir 1 couché
vitamine b1b6 2, 2, 0,0
cymbalta 1 le matin
equanil 400 2 matin 2 midi 1 soir 1 couché


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 3 guests