Hi

How can I calculate sqrt using assembler.
And insert this assembler code to C++ code.
Will it work really quicker than using just C++.

--
Vadim

Re: Assembler. Optimazation. sqrt by Victor

Victor
Wed Feb 23 09:28:45 CST 2005

÷ÁÄÉÍ óÔÁÒÏÓÔÉÎ wrote:
> How can I calculate sqrt using assembler.
> And insert this assembler code to C++ code.

It's a freaking CPU instruction. Do you really need to know?

> Will it work really quicker than using just C++.

No. Maybe. What are you going to gain if the difference between
them is, say, a few CPU ticks?

Profile your program and find the performance bottlenecks before
attempting something as drastic as using inline assembly.

V

Re: Assembler. Optimazation. sqrt by Mark

Mark
Wed Feb 23 23:17:01 CST 2005

÷ÁÄÉÍ óÔÁÒÏÓÔÉÎ wrote:
> Hi
>
> How can I calculate sqrt using assembler.
fsqrt on i386

> And insert this assembler code to C++ code.
__asm for MSVC

> Will it work really quicker than using just C++.
No, if you use /O2 (tested with VC7.1)

int main()
{
double f = 2.0;
f = sqrt(2);
printf("%f\n", f);
return 0;
}

is

_main PROC NEAR
fld QWORD PTR __real@4000000000000000
sub esp, 8
fsqrt
fstp QWORD PTR [esp]
push OFFSET FLAT:??_C@_03PPOCCAPH@?$CFf?6?$AA@
call _printf
add esp, 12
xor eax, eax
ret 0

i.e. it gets inlined. Try profiling, it works wonders.

Re: Assembler. Optimazation. sqrt by Simon

Simon
Thu Feb 24 10:34:45 CST 2005

"Victor Bazarov" <v.Abazarov@comAcast.net> wrote in message
news:OVeYexbGFHA.2976@TK2MSFTNGP09.phx.gbl...
> Profile your program and find the performance bottlenecks before
> attempting something as drastic as using inline assembly.

And of course if it is a bottleneck then the best solution is likely to be
an algorithm change-- e.g. squaring all the algorithm's inputs, doing all
the calculations in the 'square' space, and then just taking the square
roots of the result.

Of course, one has then to make sure the algorithm is tolerably faithful to
the original.

If the argument and results are of integer type, then one can use an integer
square root algoirthm. A good explanation of one such algorithm is found
here: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf



Re: Assembler. Optimazation. sqrt by Vadim

Vadim
Fri Feb 25 07:40:02 CST 2005

Thank you very much. This link is interesting.
So I can write that algorithm on C. It should be as fast as it is on
assembler?
Actually I need accelerate program. Any tricky thing is helpful for me.

Why I asked about assembler acceleration first?
I've got this quest during some test. So if you can give something about
it...

--
Vadim S.

"Simon Trew" <noneofyour@business.guv> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ
ÓÌÅÄÕÀÝÅÅ: news:uyPcD7oGFHA.1176@TK2MSFTNGP12.phx.gbl...
> "Victor Bazarov" <v.Abazarov@comAcast.net> wrote in message
> news:OVeYexbGFHA.2976@TK2MSFTNGP09.phx.gbl...
> > Profile your program and find the performance bottlenecks before
> > attempting something as drastic as using inline assembly.
>
> And of course if it is a bottleneck then the best solution is likely to be
> an algorithm change-- e.g. squaring all the algorithm's inputs, doing all
> the calculations in the 'square' space, and then just taking the square
> roots of the result.
>
> Of course, one has then to make sure the algorithm is tolerably faithful
to
> the original.
>
> If the argument and results are of integer type, then one can use an
integer
> square root algoirthm. A good explanation of one such algorithm is found
> here: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf
>
>



Re: Assembler. Optimazation. sqrt by Vadim

Vadim
Fri Feb 25 07:57:43 CST 2005

How can I get the other docs from that server(http://ww1.microchip.com/)?

--
Vadim S.

"Simon Trew" <noneofyour@business.guv> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ
ÓÌÅÄÕÀÝÅÅ: news:uyPcD7oGFHA.1176@TK2MSFTNGP12.phx.gbl...
> "Victor Bazarov" <v.Abazarov@comAcast.net> wrote in message
> news:OVeYexbGFHA.2976@TK2MSFTNGP09.phx.gbl...
> > Profile your program and find the performance bottlenecks before
> > attempting something as drastic as using inline assembly.
>
> And of course if it is a bottleneck then the best solution is likely to be
> an algorithm change-- e.g. squaring all the algorithm's inputs, doing all
> the calculations in the 'square' space, and then just taking the square
> roots of the result.
>
> Of course, one has then to make sure the algorithm is tolerably faithful
to
> the original.
>
> If the argument and results are of integer type, then one can use an
integer
> square root algoirthm. A good explanation of one such algorithm is found
> here: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf
>
>



Re: Assembler. Optimazation. sqrt by Carl

Carl
Fri Feb 25 08:54:20 CST 2005

Vadim S. wrote:
> Thank you very much. This link is interesting.
> So I can write that algorithm on C. It should be as fast as it is on
> assembler?

Just to be clear: If you need a full precision double square root, there's
nothing you can do to speed it up - the compiler already generates a single
instruction in release builds (fsqrt) and any implementation that you write,
in C++, C or assembler will be at best the same, but more likely many times
slower.

The opportunity for speed improvement comes if you don't need the full
precision of a double square root, or better still, if there's a way to
rewrite your algorithm so you call sqrt less often.

My guess is that you could code a 16 bit integer sqrt using the successive
approximation method outlined on the microchip site and have it come out
faster than a double-precision sqrt, but beyond that, you're probably out of
luck. Likewise, I'd expect guess that a 32 bit integer sqrt using the
successive approximation algorithm would end up slower than the double
precision built-in.

-cd



Re: Assembler. Optimazation. sqrt by Simon

Simon
Tue Mar 01 11:19:34 CST 2005

"Carl Daniel [VC++ MVP]" <cpdaniel_remove_this_and_nospam@mvps.org.nospam>
wrote in message news:%237QGkn0GFHA.2976@TK2MSFTNGP09.phx.gbl...
> The opportunity for speed improvement comes if you don't need the full
> precision of a double square root, or better still, if there's a way to
> rewrite your algorithm so you call sqrt less often.

Yes, as I said, we have seen real performance improvements by doing
everything in the 'squared' space.

Example to OP: lets say you have a collectionof right-angled triangles that
are storing base and height, and you want to find which has the longest
hypotenuse. You could do this:

struct { double base, height; } right_angled_triangle;
typedef std::vector<right_angled_triangle> right_angled_triangle_collection;

std::pair<right_angled_triangle_collection::iterator, double>
find_longest_hypotenuse(right_angled_triangle_collection::const_iterator
first, last)
{
right_angled_triangle_collection::const_iterator result = last;
double result_hypotenuse = 0.0;
for (right_angled_triangle_collection::const_iterator it = first; it !=
last; ++it)
{
double hypotenuse = sqrt(it->base * it->base + it->height *
it->height);
if ((result == NULL) || (hypotenuse > result_hypotenuse))
{
result = it; result_hypotenuse = hypotenuse;
}
}
return std::make_pair(result, result_hypotenuse);
}

However, it would be a lot quicker to do it like this:

std::pair<right_angled_triangle_collection::iterator, double>
find_longest_hypotenuse(right_angled_triangle_collection::const_iterator
first, last)
{
right_angled_triangle_collection::const_iterator result = last;
double result_square_of_hypotenuse = 0.0;
for (right_angled_triangle_collection::const_iterator it = first; it !=
last; ++it)
{
double square_of_hypotenuse = it->base * it->base + it->height *
it->height;
if ((result == NULL) || (square_of_hypotenuse >
result_square_of_hypotenuse))
{
result = it; result_square_of_hypotenuse = square_of_hypotenuse;
}
}
return std::make_pair(result, sqrt(result_square_of_hypotenuse)); // Only
one square root now, not one per triangle
}



Re: Assembler. Optimazation. sqrt by Vadim

Vadim
Sat Mar 05 15:37:14 CST 2005

I'm not interesting only about sqrt.
I need something easy to use about asm.
Something like:
for(register i=0; i<num; ++i)...

or address alignment

--
Thank you
Vadim S.

"Simon Trew" <noneofyour@business.guv> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ
ÓÌÅÄÕÀÝÅÅ: news:uyPcD7oGFHA.1176@TK2MSFTNGP12.phx.gbl...
> "Victor Bazarov" <v.Abazarov@comAcast.net> wrote in message
> news:OVeYexbGFHA.2976@TK2MSFTNGP09.phx.gbl...
> > Profile your program and find the performance bottlenecks before
> > attempting something as drastic as using inline assembly.
>
> And of course if it is a bottleneck then the best solution is likely to be
> an algorithm change-- e.g. squaring all the algorithm's inputs, doing all
> the calculations in the 'square' space, and then just taking the square
> roots of the result.
>
> Of course, one has then to make sure the algorithm is tolerably faithful
to
> the original.
>
> If the argument and results are of integer type, then one can use an
integer
> square root algoirthm. A good explanation of one such algorithm is found
> here: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf
>
>



Re: Assembler. Optimazation. sqrt by Scott

Scott
Sat Mar 05 15:47:13 CST 2005

Vadim S. wrote:
> I'm not interesting only about sqrt.
> I need something easy to use about asm.
> Something like:
> for(register i=0; i<num; ++i)...
>
> or address alignment
>

ASM is the processor's instruction set. It does not have anything easy
to use like 'for' or 'if'. It is machine language made readable.

--
Scott McPhillips [VC++ MVP]


Re: Assembler. Optimazation. sqrt by Jerry

Jerry
Sat Mar 12 22:38:29 CST 2005

In article <#GE$q$zGFHA.3968@TK2MSFTNGP14.phx.gbl>,
vadimstar@inbox.ru says...
> Thank you very much. This link is interesting.
> So I can write that algorithm on C. It should be as fast as it is on
> assembler?
> Actually I need accelerate program. Any tricky thing is helpful for me.
>
> Why I asked about assembler acceleration first?
> I've got this quest during some test. So if you can give something about
> it...

I doubt it'll be useful, but here's some assembly code to compute a
16-bit integer square root:

isqrt proc uses di, number:word
;
; destroys bx, cx, dx
;
mov di,number
mov ax,255
start_loop:
mov bx,ax
xor dx,dx
mov ax,di
div bx
add ax,bx
shr ax,1
mov cx,ax
sub cx,bx
cmp cx,2
ja start_loop
ret
isqrt endp

Unfortunately, the last time I checked, fsqrt was usually faster than
this (I wrote this back when f.p. hardware was quite rare).

I don't seem to have the original post in this thread, but if you are
using the square root to compute a hypotenuse, here's an algorithm
invented by Moler and Morrison that you might want to consider:

// dist.h
#ifndef DIST_H_INCLUDED
#define DIST_H_INCLUDED

#include <math.h>

double dist(double a, double b);

#endif

// dist.c[pp]
#include "dist.h"

/* Iterations Accuracy
* 2 6.5 digits
* 3 20 digits
* 4 62 digits
* assuming a numeric type able to maintain that degree of accuracy
* in the individual operations.
*/
#define ITER 3

double dist(double P, double Q) {
/* A reasonably robust method of calculating `sqrt(P*P + Q*Q)'
*
* Transliterated from _More Programming Pearls, Confessions of a
* Coder_ by Jon Bentley, pg. 156.
*/

double R;
int i;

P = fabs(P);
Q = fabs(Q);

if (P<Q) {
R = P;
P = Q;
Q = R;
}

/* The book has this as:
* if P = 0.0 return Q; in AWK
* However, this makes no sense to me - we've just insured that P>=Q,
* so P==0 only if Q==0; if Q==0, then distance == P...
*/
if ( Q == 0.0 )
return P;

for (i=0;i<ITER;i++) {
R = Q / P;
R = R * R;
R = R / (4.0 + R);
P = P + 2.0 * R * P;
Q = Q * R;
}
return P;
}

If you're computing roots a lot, you might want to pick up _More
Programming Pearls_ by Jon Bentley -- he devotes an entire chapter to
how he sped up a program that did the same (the algorithm above came
from there). He ended up using pretty much what's been advised
elsethread: work with the square of the distance instead of finding
the distance itself.

--
Later,
Jerry.

The universe is a figment of its own imagination.