[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]

*From*: Les <hlhowell pacbell net>*To*: For users of Fedora <fedora-list redhat com>*Subject*: Re: OT : Approximate / fast math libraries ?*Date*: Tue, 04 Sep 2007 17:32:45 -0700

On Tue, 2007-09-04 at 18:17 -0500, Mike McCarty wrote: > Matthew Saltzman wrote: > > On Sat, 2007-09-01 at 09:41 -0500, Michael Hennebry wrote: > > > > > >>How much precision do you need? On what? Why? > >> > >>At least one person wrote a book on implementing the C standard library. > >>It would probably be a better resource than Numerical Recipes. > > > > > > That would be PJ Plauger's The Standard C Library, Prentice Hall, 1992 > > 0-13-131509-9. Most of his math lib implementation is based on Cody and > > Waite, Software Manual for the Elementary Functions, Prentice Hall, 1980 > > (sorry, he doesn't give the ISBN). > > My copy of Cody & Waite is ISBN 0-13-822064-6. The exact title > is "Software Manual for the Elementary Functions". > > I'm afraid I'm not very impressed with "Numerical Recipes". > I bought a copy many years ago, and found some humorous lapses > in the multi-precision FFT based math package. Things which > proved that they don't know what they are doing, I'm afraid. > Like subtracting one float from another repeatedly in a loop > instead of using fmod(). > > I've had good results with Cody & Waite, though it's getting > somewhat dated (1980) and some better stuff has come along, > or so I've heard. > > But, if the hardware is being used, then coding something with > less accuracy is also going to be slower. > > Mike > -- I do have to agree with your assessment of their algorithms. But having a working algorithm means I only have to find the optimizations. And sometimes what seems archaic may be able to take advantage of compiler and processor optimizations to achieve faster results. The business of subtraction is one cycle per subtraction, where as fmod is multiple cycles to begin with plus call and return overhead. If the iterative is known to be some small number repeated subtraction may be faster. Only some practical work with the algorithm will tell you the real results. The same is true of Floating point operations vs integers. When floats had to be calculated by loops with an integer processor, they were expensive and integers were faster. Now with high speed floating point units, simple float operations are quite fast if done in line. Ditto for doubles. It costs no more to calculate doubles than singles except when you store and retrieve them on a 32 bit machine (if your blocking is set right.) On a 64 bit machine, doubles may actually be faster since you don't have to truncate or do the store offsets (note that this depends on the hardware implementations inside the processor and the microcode used to achieve the doubles and storage calculations). With some operations, the operation of the ram may be important, impacting due to cyclic overhead. Processors with high i/o bandwidth work well as single cpu, but suffer a hit when in dual cpu due to inability to overlap cycles as effectively. They may well be faster, but it depends a lot on the algorithm, and relative timing of the calculation and results, some of which can be controlled by the programmer directly, and some of which may be limited in the processor design. Ditto for memory access. DDRR ram can do i/o overlap if the processor and mb electronics can handle it. So dual processors which have certain addressing setups can both be full speed and overlapped if the other i/o functions can support it. When discussing algorithm timing, only the algorithm being used and its variants can be discussed for realtime applications. This is one reason that benchmarks are basically useless in choosing a processor for real time applications, unless you are using the bench mark algorithm in your specific application. I have spent many many hours optimizing code. I have gradually come to the conclusion that I can optimize my own code by 10 to 20 percent each cycle I put into it, along with some additional benefit from hardware advances in each cycle. This is especially true now that the hardware cycle is into the 13-14 month timeframe. Sometimes just taking a different perspective on the problem will help. I have chased various trig functions for times then figured out a relative way to achieve the same effective results with less overhead or sometimes in hardware that cost less than $40.00. Using a phase comparator to find angular offsets is one example. Sometimes a good step back, and a look at the ultimate goal will help as well. You may find that the results are more about the value of the peak than the angular offset for example. Or an FIR filter may be faster than an fft in some cases to find a specific freq value. There are more ways to tackle problems than most of us can imagine. That is why we call upon the local wizards to help us. Regards, Les H

**Follow-Ups**:**Re: OT : Approximate / fast math libraries ?***From:*Bruno Wolff III

**Re: OT : Approximate / fast math libraries ?***From:*Mike McCarty

**References**:**Re: OT : Approximate / fast math libraries ?***From:*Michael Hennebry

**Re: OT : Approximate / fast math libraries ?***From:*Matthew Saltzman

**Re: OT : Approximate / fast math libraries ?***From:*Mike McCarty