fast power function



 DEVELOP > c-Plus-Plus > fast power function

LINK TO THIS PAGE  


rating :  0   |  0


  Page 1 of 1

1

 
Topic: DEVELOP > c-Plus-Plus
User: "Chris Forone"
Date: 21 Jan 2008 04:14:28 AM
Object: fast power function
hello group,
is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?
thanks & hand, chris
.

User: "=?iso-8859-1?q?Tom=E1s_=D3_h=C9ilidhe?="

Title: Re: fast power function 21 Jan 2008 05:44:24 AM
Chris Forone:

hello group,

is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?

thanks & hand, chris

Very probably the fastest method of calculating powers for that platform.
--
Tomás Ó hÉilidhe
.
User: "Juha Nieminen"

Title: Re: fast power function 21 Jan 2008 09:59:00 AM
Tomás Ó hÉilidhe wrote:

Very probably the fastest method of calculating powers for that platform.

Not necessarily in all cases.
For example in intel architectures pow() is rather slow because
there's no such FPU opcode in 387. We are probably talking about many
hundreds, if not even over a thousand clock cycles even on a Pentium.
While compilers may try to optimize the pow() call away if they can,
they often can't.
In some cases it may be faster to "open up" a calculation than
calling pow(). For example, in many cases it may be slower to perform a
"pow(x, 1.5)" than a "x*sqrt(x)" (many compilers are unable to optimize
the former into the latter).
But of course this is more related to architectures and compilers than
to C++, and thus slightly off-topic.
.
User: "=?iso-8859-1?q?Tom=E1s_=D3_h=C9ilidhe?="

Title: Re: fast power function 21 Jan 2008 10:25:39 AM
Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?
unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;
while ( ; exp; --exp) retval *= x;
return retval;
}
--
Tomás Ó hÉilidhe
.
User: "Jerry Coffin"

Title: Re: fast power function 22 Jan 2008 10:46:02 AM
In article <Xns9A2CA71CBAD72toelavabitcom@194.125.133.14>,
toe@lavabit.com says...


Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

It already has such overloads, though there's no guarantee that they
work any differently than for floating point exponents.

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;

while ( ; exp; --exp) retval *= x;

return retval;
}

This isn't a particularly good algorithm unless exp is _quite_ small
(low double digits at most). For an N-bit exponent, this takes a maximum
of 2^N-1 multiplications. Assuming values of exp were randomly
distributed, it would take about 2^(N-1) multiplications on average.
You can do a lot better than that, taking roughly 2N multiplications in
the worst case, and something like 1.5N multiplications in the typical
case (actually, it should be somewhat less than that, but I haven't
bothered to figure up the exact number -- in any case, it's linear on
the number of bits instead of exponential.
Here's an implementation with some test code and instrumentation:
#include <vector>
#include <numeric>
unsigned mults;
template <class T>
T mult(T a, T b) {
mults++;
return a * b;
}
#define bits(object) (sizeof(object)*CHAR_BIT)
template <class T>
T my_pow(T X, unsigned Y) {
std::vector<T> sq;
sq.reserve(bits(Y));

T current_square(X);
mults = 0;
while (Y) {
if (Y&1)
sq.push_back(current_square);
Y>>=1;
current_square *= current_square;
mults++;
}
T result = std::accumulate(sq.begin(), sq.end(), T(1), mult<T>);
return result;
}
#ifdef TEST
#include <iostream>
#include <cmath>
int main() {
unsigned max_mults = 0;
unsigned total_mults = 0;
unsigned numbers = 0;
for (double i=0; i<10; i++)
for (int j=0; j<300; j++) {
std::cout << i << "^" << j << "=" << my_pow(i, j);
std::cout << "\t" << mults << " multiplications\n";
if (std::pow(i,j) != my_pow(i,j))
std::cout << "^Error!\n";
if (mults > max_mults)
max_mults = mults;
numbers ++;
total_mults += mults;
}
std::cout << "Maximum Multiplications: " << max_mults;
std::cout << "\nAverage Multipliations: " << (double)total_mults /
numbers;
return 0;
}
#endif
When I run this, I get a worst-case of 16 and an average of 11.23
multiplications per call. In every case, the result matches the standard
library exactly -- which gives a strong indication that the standard
library is implementing the code the same way. I'd expect to see at
least a few minor differences if the standard library were doing the job
by taking the logarithm and multiplying.
--
Later,
Jerry.
The universe is a figment of its own imagination.
.

User: "Pete Becker"

Title: Re: fast power function 21 Jan 2008 12:01:37 PM
On 2008-01-21 11:25:39 -0500, "Tomás Ó hÉilidhe" <toe@lavabit.com> said:


Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

It has three: pow(float, int), pow(double, int), and pow(long double, int).
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)
.

User: "Juha Nieminen"

Title: Re: fast power function 21 Jan 2008 04:09:03 PM
Tomás Ó hÉilidhe wrote:

Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

Compilers do have specialized pow() functions when the exponent is
integer.

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;

while ( ; exp; --exp) retval *= x;

return retval;
}

That's *not* the fastest way of calculating pow(x, integer).
.
User: "James Kanze"

Title: Re: fast power function 22 Jan 2008 07:50:47 AM
On Jan 21, 11:09 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

Tom=C3=A1s =C3=93 h=C3=89ilidhe wrote:

Would it not make sense for the standard library to have a pow funct=

ion

that deals only with integer exponents?

Compilers do have specialized pow() functions when the exponent is
integer.

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval =3D 1;
while ( ; exp; --exp) retval *=3D x;
return retval;
}

That's *not* the fastest way of calculating pow(x, integer).

Nor the most accurate.
--
James Kanze (GABI Software) mailto:james.kanze@gmail.com
Conseils en informatique orient=EF=BF=BDe objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=EF=BF=BDmard, 78210 St.-Cyr-l'=EF=BF=BDcole, France, +33 (0)1 30 2=
3 00 34
.
User: "Lionel B"

Title: Re: fast power function 22 Jan 2008 08:49:41 AM
On Tue, 22 Jan 2008 05:50:47 -0800, James Kanze wrote:

On Jan 21, 11:09 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

Tomás Ó hÉilidhe wrote:

Would it not make sense for the standard library to have a pow
function
that deals only with integer exponents?


Compilers do have specialized pow() functions when the exponent is
integer.


unsigned pow_int(unsigned const x,unsigned exp) {
unsigned retval = 1;


while ( ; exp; --exp) retval *= x;
return retval;
}


That's *not* the fastest way of calculating pow(x, integer).

It may or may not be...

Nor the most accurate.

Surely it's guaranteed 100% accurate* - it's integer arithmetic.
*provided retval doesn't overflow.
--
Lionel B
.
User: "James Kanze"

Title: Re: fast power function 23 Jan 2008 02:41:14 AM
On Jan 22, 3:49 pm, Lionel B <m...@privacy.net> wrote:

On Tue, 22 Jan 2008 05:50:47 -0800, James Kanze wrote:

On Jan 21, 11:09 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

Tom=E1s =D3 h=C9ilidhe wrote:

Would it not make sense for the standard library to have a pow
function
that deals only with integer exponents?

Compilers do have specialized pow() functions when the exponent is
integer.

unsigned pow_int(unsigned const x,unsigned exp) {
unsigned retval =3D 1;
while ( ; exp; --exp) retval *=3D x;
return retval;
}

That's *not* the fastest way of calculating pow(x, integer).

It may or may not be...

It's not, by far.

Nor the most accurate.

Surely it's guaranteed 100% accurate* - it's integer arithmetic.

Oops. I missed that; I read the line "Compilers do have
specialized pow() functions when the exponent is integer", and
didn't look at the actual function parameters.
Using this function if the base is a floating point type, of
course, will result in very poor accuracy.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
.





User: "Jerry Coffin"

Title: Re: fast power function 22 Jan 2008 01:14:17 AM
In article <4794bdad$0$23838$4f793bc4@news.tdc.fi>,
nospam@thanks.invalid says...
[ ... ]

For example in intel architectures pow() is rather slow because
there's no such FPU opcode in 387. We are probably talking about many
hundreds, if not even over a thousand clock cycles even on a Pentium.
While compilers may try to optimize the pow() call away if they can,
they often can't.

I don't know of any processor that would let you implement pow in a
single instruction -- but Intel floating point includes instructions for
logarithm and inverse logarithm, which make pow pretty easy to
implement.
As far as speed goes, you should be looking at about 150-190 CPU cycles
on a reasonably modern CPU (depending somewhat on data). I haven't found
any data for which it takes 200 cycles on my machine, and it's a few
years old -- I believe current CPUs are typically at least 20-30% faster
at the same clock speed.
--
Later,
Jerry.
The universe is a figment of its own imagination.
.
User: "Juha Nieminen"

Title: Re: fast power function 22 Jan 2008 06:42:28 AM
Jerry Coffin wrote:

I don't know of any processor that would let you implement pow in a
single instruction -- but Intel floating point includes instructions for
logarithm and inverse logarithm, which make pow pretty easy to
implement.

Actually it's not "pretty easy", as there's a twist.
pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
opcode for calculating y*log2(x), fyl2x, and one for calculating
pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
be inside the range from -1.0 to +1.0. This means that the x in the
original pow(x, y) cannot be used as it is with f2xm1, but it must be
split using the property 2^(a+b) = 2^a*2^b.
The complete asm code for calculating pow(x, y), assuming x and y have
already been loaded into the FPU, is something like (with reported clock
cycles for the pentium):
fyl2x ; 22-111
fld st ; 1
frndint ; 9-20
fsub st(1), st ; 3
fxch st(2) ; 0-1
f2xm1 ; 13-57
fld1 ; 2
fadd ; 3
fscale ; 20-31
fstp st(1) ; 1
I suppose the absolute best case scenario is a total of 74 clock
cycles, and the worst is 230 clock cycles, with possible stalls.
I suppose I overestimated the required clock cycles.
.
User: "Jerry Coffin"

Title: Re: fast power function 22 Jan 2008 08:53:25 AM
In article <4795e4d9$0$14991$4f793bc4@news.tdc.fi>,
nospam@thanks.invalid says...

Jerry Coffin wrote:

I don't know of any processor that would let you implement pow in a
single instruction -- but Intel floating point includes instructions for
logarithm and inverse logarithm, which make pow pretty easy to
implement.


Actually it's not "pretty easy", as there's a twist.

Actually, I'm well aware of the "twists". At least by my figuring, it
still qualifies as "pretty easy".

pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
opcode for calculating y*log2(x), fyl2x, and one for calculating
pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
be inside the range from -1.0 to +1.0. This means that the x in the
original pow(x, y) cannot be used as it is with f2xm1, but it must be
split using the property 2^(a+b) = 2^a*2^b.

Yes -- and that doesn't stop it from being pretty easy.

The complete asm code for calculating pow(x, y), assuming x and y have
already been loaded into the FPU, is something like (with reported clock
cycles for the pentium):

fyl2x ; 22-111
fld st ; 1
frndint ; 9-20
fsub st(1), st ; 3
fxch st(2) ; 0-1
f2xm1 ; 13-57
fld1 ; 2
fadd ; 3
fscale ; 20-31
fstp st(1) ; 1

That's certainly similar to the code I've always used:
fyl2x
fld st(0)
frndint
fld st(1)
fsub st(0),st(1)
f2xm1
fld1
faddp st(1),st(0)
fxch st(1)
fld1
fscale
fxch st(1)
fstp st(0)
fmulp st(1),st(0)

I suppose the absolute best case scenario is a total of 74 clock
cycles, and the worst is 230 clock cycles, with possible stalls.

I suppose I overestimated the required clock cycles.

Especially since you seem to be basing your clock counts on the Pentium,
which is an _ancient_ processor. The Pentium was discontinued more than
a decade ago. Even if you obtain your computers from other people's
trash, you can something a lot newer by now.
To put things in perspective, using the code above, it took about half
again longer to load X and Y from memory into the FPU than it did to
compute X^Y after they were loaded.
--
Later,
Jerry.
The universe is a figment of its own imagination.
.





User: "Jerry Coffin"

Title: Re: fast power function 22 Jan 2008 01:14:18 AM
In article <fn1ra4$4qp$1@newsreader1.xoc.utanet.at>,
says...

hello group,

is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?

The implementation in the library is likely to be oriented more toward
being general purpose than the fastest for a given situation. You can
probably do substantially better if (and only if) you know a fair amount
about the data you're working with, so you can write something more
specialized.
--
Later,
Jerry.
The universe is a figment of its own imagination.
.

User: "Chris Forone"

Title: Re: fast power function 23 Jan 2008 02:37:23 AM
Thaks a lot!
yours sincerely, chris
.


  Page 1 of 1

1

 


Related Articles
 

NEWER

pg.1232     pg.940     pg.716     pg.544     pg.412     pg.311     pg.234     pg.175     pg.130     pg.96     pg.70     pg.50     pg.35     pg.24     pg.16     pg.10     pg.6     pg.3     pg.1

OLDER