f



Trigonometric operations on x86 and x64 CPUs

A while back I expressed surprise that there were no versions of the Elemen=
tary_Functions package for x86/x64 packages which used the built-in trig fu=
nctions of the x86 family.  It looked like an easy project but...

It turns out that the only trig functions available are the former x87 inst=
ructions which use a special stack.  Not a problem--except that these instr=
uctions are not reachable in (64-bit) "Long_Mode".  You have to have a sepa=
rate compilation unit that switches to one of the 32-bit modes.  Okay... Bu=
t you also need to support saving and restoring the x87 register stack.  Th=
ere is a protocol for doing that, but 64-bit code can't do it.  You would t=
hink "No problem 64-bit programs can't execute x87 instructions anyway."  E=
xcept, that is exactly what you are trying to do!  So you need to call a co=
mpatibility mode proceedure every time the CPU does a 64-bit context switch=
..=20

At this point, I was ready to give up on the project as a bad job.  I could=
 modify the stack save/restore code on Linux, but even then I would have a =
distributed cost.  Then I ran across a page which explains that the Intel i=
nstructions use a 66-bit representation of Pi, and that results from the ra=
nge reduction folding done in the software, is some values having only four=
 accurate significant bits!  It looks better when expressed relative to the=
 argument in radians--but often you want to find sine or cosine of an angle=
 in degrees...

Anyway, turns out you don't want a package that uses the built-in trig func=
tions.  What about square root, logs, and e^x?  They should be all right bu=
t you still have the nasty distributed overhead. Oh! And the stored values =
for e, and the logs to convert to base 2 or 10 are all 66 bit.  Maybe some =
future instruction set extension will provide correct versions of the trig =
functions without the 32-bit mode requirement.
0
Robert
12/16/2016 12:38:03 AM
comp.lang.ada 8774 articles. 2 followers. Post Follow

16 Replies
603 Views

Similar Articles

[PageSpeed] 26

Robert Eachus <rieachus@comcast.net> wrote:
values for e, and the logs to convert to base 2 or 10 are all 66 bit. 
Maybe some future instruction set extension will provide correct versions
of the trig functions without the 32-bit mode requirement.
> 

Surely long mode has fpu instructions that can be used?


0
Luke
12/16/2016 2:00:20 PM
"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:8d0f7f03-9324-4702-9100-d6b8a1f16fc5@googlegroups.com...
>A while back I expressed surprise that there were no versions of the
>Elementary_Functions package for x86/x64 packages which used the
>built-in trig functions of the x86 family.  It looked like an easy project 
>but...

Why? Janus/Ada has always (*) had such a package. It's no problem with a 
32-bit compiler. (Still don't know why anyone would need more than that much 
address space anyway.)

(*) OK, not in the earliest integer-only compilers.

But:

(1) We don't use it for Generic_Elementary_Functions because it's impossible 
to be sure that the built-in instructions meet that Annex G accuracy 
requirements.

(2) Intel's documentation way back when made it clear that you had to do 
argument reduction first (yourself). The instructions were not intended to 
be accurate for values outside of +/- 2*PI (or something like that, I'm 
writing this from memory).

(3) Argument reduction is always going to lose a lot of precision for large 
values, when you start with a 64 bit value there isn't going to be much left 
if the value is large. Hard to blame that mathematical fact on the hardware.

(4) Intel doesn't seem to have done anything performance-wise for these 
instructions. Last time I checked, they were only a few times faster than 
the usual software approximation. Not enough to make them useful outside of 
the most time-critical application. (Perhaps some preconditions would help 
here, as a lot of the cost is checking that the values are in range before 
applying the hardware operation. Moving that to the call-site would allow it 
to be eliminated.)

In any case, in general, I'd trust the Ada implementer to have looked at the 
issues and having come up with the best possible implementation on the 
hardware. They have a lot more at stake than any individual user (and a lot 
more tools as well). If they're not using something, most likely it's 
because of a good reason or two or six. :-)

                                Randy.


0
Randy
12/16/2016 8:16:23 PM
Don't understand what are you worry about.

I suppose all (too lazy to check ALL instructions) floating point instructi=
ons are supported in 64-bit code. GNAT's RTL do all necessary software prep=
rocessing to obtain good precision. OS do all necessary context store/resto=
re operations on task/process switching. You need just to instantiate Ada.N=
umerics.Generic_Elementary_Functions and use it. ;)
0
Vadim
12/16/2016 8:50:04 PM
On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
=20
> (1) We don't use it for Generic_Elementary_Functions because it's impossi=
ble=20
> to be sure that the built-in instructions meet that Annex G accuracy=20
> requirements.
>
Correct
>=20
> (2) Intel's documentation way back when made it clear that you had to do=
=20
> argument reduction first (yourself). The instructions were not intended t=
o=20
> be accurate for values outside of +/- 2*PI (or something like that, I'm=
=20
> writing this from memory)
>.
Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
>=20
> (3) Argument reduction is always going to lose a lot of precision for lar=
ge=20
> values, when you start with a 64 bit value there isn't going to be much l=
eft=20
> if the value is large. Hard to blame that mathematical fact on the hardwa=
re.
>
The problem is the 66-bit value of Pi in the hardware. Look at the sine of =
a number close to Pi call it X.  The sine will be very close to X - Pi.  As=
suming 64 bit (double) precision for X, the mantissa will be a couple bits,=
 perhaps none, from X and the rest of the bits will come from bits 49-96 of=
 Pi.  Use 80-bit extended which the x87 instructions support, and you will =
be taking bits 65 to 128 from the value of Pi.

Could Intel have done the range reduction right?  Sure.  It would add a few=
 instructions to the micro code, and require a longer value for Pi.=20
>=20
> In any case, in general, I'd trust the Ada implementer to have looked at =
the=20
> issues and having come up with the best possible implementation on the=20
> hardware. They have a lot more at stake than any individual user (and a l=
ot=20
> more tools as well). If they're not using something, most likely it's=20
> because of a good reason or two or six. :-)
>
Creating a package which does the range reduction right, and passes small v=
alues through to the hardware instructions is not all that hard.  However, =
FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx registers unless=
 they have been used. Other threads running at the same time are unlikely t=
o be using these registers, but the OS will need to save and restore these =
registers when moving to and from your thread.

In other words, the actual user instructions executed for a x87 trig functi=
on may be fewer and faster than doing it all in 64/128 bit XMM/YMM register=
s, but the overhead on thread switches and interrupts will more than make u=
p for it.  The Elementary_Functions package will have to run in a 32-bit th=
read, so unless your entire program is in a 32-bit mode, you will pay this =
cost on every call.
0
Robert
12/16/2016 11:20:00 PM
On Saturday, December 17, 2016 at 1:20:03 AM UTC+2, Robert Eachus wrote:
> On Friday, December 16, 2016 at 3:16:25 PM UTC-5, Randy Brukardt wrote:
> =20
> > (1) We don't use it for Generic_Elementary_Functions because it's impos=
sible=20
> > to be sure that the built-in instructions meet that Annex G accuracy=20
> > requirements.
> >
> Correct
> >=20
> > (2) Intel's documentation way back when made it clear that you had to d=
o=20
> > argument reduction first (yourself). The instructions were not intended=
 to=20
> > be accurate for values outside of +/- 2*PI (or something like that, I'm=
=20
> > writing this from memory)
> >.
> Actually more like +/- Pi/4 for Cosine and +/- Pi/8 for tangent.
> >=20
> > (3) Argument reduction is always going to lose a lot of precision for l=
arge=20
> > values, when you start with a 64 bit value there isn't going to be much=
 left=20
> > if the value is large. Hard to blame that mathematical fact on the hard=
ware.
> >
> The problem is the 66-bit value of Pi in the hardware. Look at the sine o=
f a number close to Pi call it X.  The sine will be very close to X - Pi.  =
Assuming 64 bit (double) precision for X, the mantissa will be a couple bit=
s, perhaps none, from X and the rest of the bits will come from bits 49-96 =
of Pi.  Use 80-bit extended which the x87 instructions support, and you wil=
l be taking bits 65 to 128 from the value of Pi.
>=20

That part ignores the problem of GIGO.
When we say that argument of sin is x then 99.9999% of the time it does not=
 mean that argument of sin is *exactly* x, but that it is most probably in =
range [x-0.5*ULP..x+0.5*ULP]. Or worse.
x87 implementation of sin(x) for abs(x) > 2*pi always returns the value in =
range [sin(x-0.5*ULP)..sin(x+0.5*ULP)] (actually, 2 times better than that =
for extended precision or 4096 times better for double precision) so  for 9=
9.9999% of us it is more than good enough.
Remaining 0.0001% of us are either mathematicians that hopefully know what =
they are doing of sensationalist that should be ignored by any sane designe=
r of RTL.

> Could Intel have done the range reduction right?  Sure.  It would add a f=
ew instructions to the micro code, and require a longer value for Pi.=20
> >=20
> > In any case, in general, I'd trust the Ada implementer to have looked a=
t the=20
> > issues and having come up with the best possible implementation on the=
=20
> > hardware. They have a lot more at stake than any individual user (and a=
 lot=20
> > more tools as well). If they're not using something, most likely it's=
=20
> > because of a good reason or two or six. :-)
> >
> Creating a package which does the range reduction right, and passes small=
 values through to the hardware instructions is not all that hard. =20

Such implementations are common, due the hype created by sensationalists.
But, for reasons stated above, I personally consider them as disservice for=
 overwhelming majority of users that would be served much better by using x=
87 implementation "as is".

> However, FXSAVE and FXRSTOR do not save (and restore) the ST(x)/MMx regis=
ters >unless they have been used. Other threads running at the same time ar=
e >unlikely to be using these registers, but the OS will need to save and r=
estore < these registers when moving to and from your thread.
>=20
> In other words, the actual user instructions executed for a x87 trig func=
tion may be fewer and faster than doing it all in 64/128 bit XMM/YMM regist=
ers, but the overhead on thread switches and interrupts will more than make=
 up for it.=20

It depends on number of x87 instructions used between task switches.
If there are more than few dozens of normal arithmetic instructions or more=
 than couple of transcendental instructions then an overhead per instructio=
n will be negligible.

Also, as far as I am concerned, the best thing about x87 sin instruction is=
 not that it is faster than the competent AVX implementation (for vectoriza=
ble cases it is likely non-trivially slower than competent AVX implementati=
on, like one in IPP), but that, for small arguments (range [-pi/2..pi/2]), =
it is more precise. I.e. in competent AVX implementation one would expect c=
orrectly rounded results for 85-90% of small double precision arguments. On=
 the other hand, x87 implementation will produce correctly rounded results =
for something like 99.98% of small double precision arguments.

>The Elementary_Functions package will have to run in a 32-bit thread, so >=
unless your entire program is in a 32-bit mode, you will pay this cost on >=
every call.

That part makes no sense to me whatsoever.
I think that you have some misconception about x87 in long mode.
FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppose =
MAC OS/X too, although I don't know it for sure) goes, x87 in long mode jus=
t works.



0
already5chosen
12/18/2016 10:09:54 AM
On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wrot=
e:

> Such implementations are common, due the hype created by sensationalists.=
=20
> But, for reasons stated above, I personally consider them as disservice f=
or
> overwhelming majority of users that would be served much better by using =
x87
> implementation "as is".

I am one of those other users.  In paricular tangents near Pi/4 (90 degrees=
) have very large swings for small errors.  A bad value of Pi can result in=
 a very large negative value for the tangent instead of a very large positi=
ve value.=20
 =20
> I think that you have some misconception about x87 in long mode.
> FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I suppos=
e MAC OS/X too, although I don't know it for sure) goes, x87 in long mode j=
ust works.

For now, maybe:

"Early reports claimed that the operating system scheduler would not save a=
nd restore the x87 FPU machine state across thread context switches. Observ=
ed behavior shows that this is not the case: the x87 state is saved and res=
tored, except for kernel mode-only threads (a limitation that exists in the=
 32-bit version as well). The most recent documentation available from Micr=
osoft states that the x87/MMX/3DNow! instructions may be used in long mode,=
 but that they are deprecated and may cause compatibility problems in the f=
uture."  (From Wikipedia)

Obviously, a package which results in garbage if some other thread running =
on the same CPU core is using MMX or x87 instructions is not something I wa=
nt my name attached to.  That means that I have to either figure out a test=
* which guarantees that the x87 registers are saved and restored correctly,=
 or mark the package as 32-bit compatibility code. =20

By the way, you may want to check which mode of code your compiler puts out=
..  For (AMD and Intel) x64 CPUs there are three: Legacy, Compatibility, and=
 Long modes.  Long mode uses 64-bit addressing throughout. Compatibility mo=
de allows users a 32-bit (4 Gigbyte) virtual address space per program/appl=
ication.  A program can mix long and compatibility mode sections, and in pr=
actice most libraries are compatibility mode.

* The test can be in the body of the package and executed only once per pro=
gram execution.  If there were a bit I could check... 
0
Robert
12/18/2016 2:19:10 PM
On 12/18/2016 03:19 PM, Robert Eachus wrote:
> On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wrote:
>

> I am one of those other users.  In paricular tangents near Pi/4 (90 degrees) have very large swings for small errors.  A bad value of Pi can result in a very large negative value for the tangent instead of a very large positive value.

This seems to be a bad mathematical formulation of some problem, as it 
is often the case when a very high numerical precision of a numerical 
function is demanded. But tell me if I am wrong. Close to Pi/4 the 
following relation holds:

    tan (Pi/4 + phi) -> -1/phi, for phi -> 0,

so an approximation like this should be used for values close to Pi/4. 
Is the angle really known with a precision which justifies a special 
implementation of the trigonometric functions?
-- 
Frank Hrebabetzky		+49 / 6355 / 989 5070

0
hreba
12/18/2016 3:45:43 PM
On Sunday, December 18, 2016 at 4:19:12 PM UTC+2, Robert Eachus wrote:
> On Sunday, December 18, 2016 at 5:09:55 AM UTC-5, already...@yahoo.com wr=
ote:
>=20
> > Such implementations are common, due the hype created by sensationalist=
s.=20
> > But, for reasons stated above, I personally consider them as disservice=
 for
> > overwhelming majority of users that would be served much better by usin=
g x87
> > implementation "as is".
>=20
> I am one of those other users.=20

What do you mean?
Are you one of 99.9999% that will be better served by fptan instructions 'a=
s is' or one of 0.0001% that can benefit from more precise range reduction?

> In paricular tangents near Pi/4 (90 degrees)

pi/2

> have very large swings for small errors.  A bad value of Pi can result in=
 a very large negative value for the tangent instead of a very large positi=
ve value.=20

How do you calculate argument of tangent?
Is it result of addition/subtraction? In this case you are crying over spil=
led milk, because addition/subtraction already killed the precision of the =
argument and precision of evaluation of tangent is already irrelevant.

Or is it a result of multiplication by PI? Then you are using wrong functio=
n. You should use tanPi(). Well I know, easier said than done. tanPi() is a=
 part of IEEE standard for more than decade, but it is still not part of st=
andard libraries for majority of popular languages.

So, what should you do if you want to calculate tan(x*pi) for value of x ve=
ry close to 0.5 and tanPi() is not available?
The answer is obvious - calculate 1/tan((0.5-x)*pi).

Anyway, more precise argument reduction for tan() is not something that can=
 help your case.


>  =20
> > I think that you have some misconception about x87 in long mode.
> > FYI, as far as hardware and popular x64 OSes (Windows/Linux/BSD, I supp=
ose MAC OS/X too, although I don't know it for sure) goes, x87 in long mode=
 just works.
>=20
> For now, maybe:
>=20
> "Early reports claimed that the operating system scheduler would not save=
 and restore the x87 FPU machine state across thread context switches. Obse=
rved behavior shows that this is not the case: the x87 state is saved and r=
estored, except for kernel mode-only threads (a limitation that exists in t=
he 32-bit version as well). The most recent documentation available from Mi=
crosoft states that the x87/MMX/3DNow! instructions may be used in long mod=
e, but that they are deprecated and may cause compatibility problems in the=
 future."  (From Wikipedia)

Sounds like political bullshit.
Neither Microsoft nor other major OS vendors will ever stop saving/restorin=
g x87 state on task switch on x64 platform. Because disadvantages of doing =
so are too big and *technical* (as opposed to political) advantages are alm=
ost non-existing.
They (Microsoft) surely want to maximize similarity between x64 and other p=
latforms, esp aarch64, and they surely would like to kill x87 because of th=
at, but they simply can't do it.

>=20
> Obviously, a package which results in garbage if some other thread runnin=
g on the same CPU core is using MMX or x87 instructions is not something I =
want my name attached to.  That means that I have to either figure out a te=
st* which guarantees that the x87 registers are saved and restored correctl=
y, or mark the package as 32-bit compatibility code.

It is always saved/restored. It will continue to be saved/restored for as l=
ong as x64 platform is supported.

>=20
> By the way, you may want to check which mode of code your compiler puts o=
ut.  For (AMD and Intel) x64 CPUs there are three: Legacy, Compatibility, a=
nd Long modes.  Long mode uses 64-bit addressing throughout. Compatibility =
mode allows users a 32-bit (4 Gigbyte) virtual address space per program/ap=
plication.  A program can mix long and compatibility mode sections, and in =
practice most libraries are compatibility mode.

I don't understand what are you talking about.
In-process libraries, like those that we discuss, always run in the same mo=
de as the rest of the process. At least that's how it works on Windows/Linu=
x/*BSD on x64.
I heard that IBM z/OS works differently, but in context of x87 that particu=
lar bit trivia does not appear relevant.

>=20
> * The test can be in the body of the package and executed only once per p=
rogram execution.  If there were a bit I could check...

0
already5chosen
12/18/2016 3:47:04 PM
<already5chosen@yahoo.com> wrote in message 
news:a6734b29-5ffc-4938-bbc2-453f7ae92325@googlegroups.com...
....
>> Creating a package which does the range reduction right, and passes
>> small values through to the hardware instructions is not all that hard.
>
>Such implementations are common, due the hype created by sensationalists.

Yeah, like the people at Intel who document how to use these instructions. 
:-) They recommend (at least in the documents I have seen) to do argument 
reduction before using the instructions.

There's also the little matter of meeting the Ada language requirements. The 
people who put accuracy requirements on Ada numeric libraries might have 
been "sensationalists"", but those of us implementing Ada have to abide by 
those requirements. (Argubly, one of the advantages of Ada is that there are 
requirements, so you have some asssurance out of the box that the numerics 
will work predicably, no matter what target you use.)

                                 Randy.


0
Randy
12/19/2016 11:11:25 PM
On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
> <already5chosen@yahoo.com> wrote in message=20
> news:a6734b29-5ffc-4938-bbc2-453f7ae92325@googlegroups.com...
> ...
> >> Creating a package which does the range reduction right, and passes
> >> small values through to the hardware instructions is not all that hard=
..
> >
> >Such implementations are common, due the hype created by sensationalists=
..
>=20
> Yeah, like the people at Intel who document how to use these instructions=
..=20
> :-)=20
> They recommend (at least in the documents I have seen) to do argument=20
> reduction before using the instructions.
>=20

I don't know what documents you had seen.
In "Intel 64 and IA-32 Architectures Software Developer=E2=80=99s Manual" I=
ntel recommend to use argument reduction when the argument is out of suppor=
ted range [-2**63..+2**63].
Of course trigs with double precision argument outside of range [-2**54..+2=
**54] are no more than equivalents of bad PRNG, so Intel's advice does not =
have a lot of practical significance.
IMHO, for x outside of [-2**63..+2**63] it would be o.k. for sin(x) and cos=
(x) to return any value in range [-1..1]. They are all equally meaningless.

> There's also the little matter of meeting the Ada language requirements. =
The=20
> people who put accuracy requirements on Ada numeric libraries might have=
=20
> been "sensationalists"", but those of us implementing Ada have to abide b=
y=20
> those requirements.=20

Unfortunately, I don't know where to look for Ada language requirement.
Can you tell me what are requirements for sin/cos in Ada numeric library?

> (Argubly, one of the advantages of Ada is that there are=20
> requirements, so you have some asssurance out of the box that the numeric=
s=20
> will work predicably, no matter what target you use.)
>=20
>                                  Randy.

Yes, before IEEE-754 that was a significant advantage*.
Today - less so.

----------
* - I wonder what happened in real world on machines with bad arithmetic, l=
ike CRAY-1? Did not it lead to situation where significant parts of Ada num=
eric library were so slow that users concerned with speed were rolling thei=
r own alternatives.
0
already5chosen
12/19/2016 11:49:36 PM
On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
> On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
 >> ...
>> There's also the little matter of meeting the Ada language requirements. The
>> people who put accuracy requirements on Ada numeric libraries might have
>> been "sensationalists"", but those of us implementing Ada have to abide by
>> those requirements.
>
> Unfortunately, I don't know where to look for Ada language requirement.
> Can you tell me what are requirements for sin/cos in Ada numeric library?

RM section G.2.4, Accuracy Requirements for the Elementary Functions.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .
0
Niklas
12/20/2016 5:27:31 AM
Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
>> On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote:
>>> ...
>>> There's also the little matter of meeting the Ada language
>>> requirements. The people who put accuracy requirements on Ada
>>> numeric libraries might have been "sensationalists"", but those of
>>> us implementing Ada have to abide by those requirements.
>>
>> Unfortunately, I don't know where to look for Ada language
>> requirement.  Can you tell me what are requirements for sin/cos in
>> Ada numeric library?
>
> RM section G.2.4, Accuracy Requirements for the Elementary Functions.

http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-G-2-4.html
0
Simon
12/20/2016 8:37:16 AM
On 20/12/2016 09:37, Simon Wright wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

>> RM section G.2.4, Accuracy Requirements for the Elementary Functions.
>
> http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-G-2-4.html
>

Also in

file:///GNAT_INSTALL_PATH/share/doc/gnat/txt/arm-12.txt

Similarly played in other installations of Ada. The standard
is freely available and will be found on your disk, and
should be accessible from within your IDE.
0
G
12/20/2016 9:12:38 AM
On Tuesday, December 20, 2016 at 7:27:32 AM UTC+2, Niklas Holsti wrote:
> On 16-12-20 01:49 , already5chosen@yahoo.com wrote:
> > On Tuesday, December 20, 2016 at 1:11:26 AM UTC+2, Randy Brukardt wrote=
:
>  >> ...
> >> There's also the little matter of meeting the Ada language requirement=
s. The
> >> people who put accuracy requirements on Ada numeric libraries might ha=
ve
> >> been "sensationalists"", but those of us implementing Ada have to abid=
e by
> >> those requirements.
> >
> > Unfortunately, I don't know where to look for Ada language requirement.
> > Can you tell me what are requirements for sin/cos in Ada numeric librar=
y?
>=20
> RM section G.2.4, Accuracy Requirements for the Elementary Functions.
>=20
> --=20
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .

Thank you.

At first glance it seems that Randy Brukardt is correct.
Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requirements=
 want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53)..exact=
_sin(x)*(1+d*2**(-53))] where d=3D2.
x87 SIN instruction by itself achieves specified precision in smaller range=
 (~ up to abs(x) < 2^14).

It means that conforming implementations of  Ada libraries forced to spends=
 a significant effort doing reduction of stupidly big arguments of sin/cos/=
tan. On the other hand The Requirements allow rather poor precision for sma=
ll arguments (d=3D2) where better precision (d=3D1.25 or d=3D1.125) is both=
 desirable and not especially hard to achieve.

There is a consolation, too - the range reduction in the minimal range requ=
ired by the standard can be done without big tables. And it can be implemen=
ted relatively quickly if the hardware features fused multiply-add.

On somewhat related note,=20
It seems to me that forward trigonometric functions with 'Cycle' argument a=
re underspecified. I'd like the requirements for extremely useful special c=
ases of 'Cycle' =3D=3D exact power of 2 to be more pronounced.

0
already5chosen
12/20/2016 6:01:53 PM
<already5chosen@yahoo.com> wrote in message 
news:c3725fa2-7eaa-4020-b0ae-6ddcfc2a3d1d@googlegroups.com...
....
>At first glance it seems that Randy Brukardt is correct.

It seems likely, even though today I'd be hard pressed to explain why in any 
detail. We're talking about work done in the mid-1990s.

>Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requirements
>want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53) ..
>exact_sin(x)*(1+d*2**(-53))] where d=2.
>x87 SIN instruction by itself achieves specified precision in smaller range
> (~ up to abs(x) < 2^14).
>
>It means that conforming implementations of  Ada libraries forced to spends
>a significant effort doing reduction of stupidly big arguments of 
>sin/cos/tan.

Also on doing sanity checks of operands (but of course that's generally a 
strength of Ada). A version of GEF (Generic_Elementary_Functions) that used 
Ada 2012 preconditions could avoid much of that overhead, and would be an 
advantage for really speed-critical operations.

>On the other hand The Requirements allow rather poor precision for small
>arguments (d=2) where better precision (d=1.25 or d=1.125) is both
>desirable and not especially hard to achieve.

Keep in mind that the requirements were written by a team of numerical 
analysis experts in 1992-4 based on the state of the art at that time. 
(Probably the Cody/Waite algorithms.) Those of us who maintain the Standard 
today don't really have the expertise to make any informed changes to these 
rules, so for the most part we keep our hands off (the worst thing we could 
do would be to mess up carefully considered rules - but we have fixed a 
number of obvious errors).

>There is a consolation, too - the range reduction in the minimal range 
>required
>by the standard can be done without big tables. And it can be implemented
>relatively quickly if the hardware features fused multiply-add.

My recollection is that it isn't even that bad if one writes the entire 
algorithm in Ada. (Since our compiler generally keeps intermediate results 
in the 80-bit extended format, we tended to write as much as possible as a 
single large expression. Probably could do that even better today with Ada 
2012 conditional expressions.)

>On somewhat related note, it seems to me that forward trigonometric 
>functions
>with 'Cycle' argument are underspecified. I'd like the requirements for 
>extremely
>useful special cases of 'Cycle' == exact power of 2 to be more pronounced.

It's certainly possible. We'd welcome someone with substantial numeric 
experience contributing to the ARG for Ada Standards maintenance. Most of us 
know enough to understand the Ada numeric model and when something is 
incorrect for it, but we don't have anyone very good at the subtle details.

                                           Randy.



0
Randy
12/21/2016 1:20:50 AM
On Wednesday, December 21, 2016 at 3:20:52 AM UTC+2, Randy Brukardt wrote:
> <already5chosen@yahoo.com> wrote in message=20
> news:c3725fa2-7eaa-4020-b0ae-6ddcfc2a3d1d@googlegroups.com...
> ...
> >At first glance it seems that Randy Brukardt is correct.
>=20
> It seems likely, even though today I'd be hard pressed to explain why in =
any=20
> detail. We're talking about work done in the mid-1990s.
>=20
> >Assuming IEEE binary64, for any x in range [-2**26.5..2**26.5] requireme=
nts
> >want Sin(x) to return the number in range [exact_sin(x)*(1-d*2**53) ..
> >exact_sin(x)*(1+d*2**(-53))] where d=3D2.
> >x87 SIN instruction by itself achieves specified precision in smaller ra=
nge
> > (~ up to abs(x) < 2^14).
> >
> >It means that conforming implementations of  Ada libraries forced to spe=
nds
> >a significant effort doing reduction of stupidly big arguments of=20
> >sin/cos/tan.
>=20
> Also on doing sanity checks of operands=20

That's not the same.
Unlike argument reduction, range/Inf/NaN checks are computationally cheap.

> (but of course that's generally a=20
> strength of Ada).

In specific case of numeric libraries, sanity checks are not unique to Ada.=
 Right now I can't recollect the language that does *not* do sanity checks =
in its numeric lib. Except, of course, for functions like sin/cos where som=
e values of input arguments do not make physical sense, but nevertheless al=
l inputs are legal.

> A version of GEF (Generic_Elementary_Functions) that used=20
> Ada 2012 preconditions could avoid much of that overhead, and would be an=
=20
> advantage for really speed-critical operations.
>=20
> >On the other hand The Requirements allow rather poor precision for small
> >arguments (d=3D2) where better precision (d=3D1.25 or d=3D1.125) is both
> >desirable and not especially hard to achieve.
>=20
> Keep in mind that the requirements were written by a team of numerical=20
> analysis experts in 1992-4 based on the state of the art at that time.=20
> (Probably the Cody/Waite algorithms.) Those of us who maintain the Standa=
rd=20
> today don't really have the expertise to make any informed changes to the=
se=20
> rules, so for the most part we keep our hands off (the worst thing we cou=
ld=20
> do would be to mess up carefully considered rules - but we have fixed a=
=20
> number of obvious errors).
>=20
> >There is a consolation, too - the range reduction in the minimal range=
=20
> >required
> >by the standard can be done without big tables. And it can be implemente=
d
> >relatively quickly if the hardware features fused multiply-add.
>=20
> My recollection is that it isn't even that bad if one writes the entire=
=20
> algorithm in Ada. (Since our compiler generally keeps intermediate result=
s=20
> in the 80-bit extended format, we tended to write as much as possible as =
a=20
> single large expression. Probably could do that even better today with Ad=
a=20
> 2012 conditional expressions.)

80-bit format helps when underlying machine has 80-bit format. Many machine=
s have not.
Also, I suppose that even on x386/AMD64 machines that do support 80-bit for=
mat,  modern code generators by default use SSE/AVX registers rather than x=
87 registers.
FMA makes argument reduction easier even when extended precision is not ava=
ilable.

Besides, it seems to me that at upper edge of our problematic range 80-bit =
precision of intermediate results is insufficient for really simple range r=
eduction. To make things really simple, one needs intermediates with 53+26.=
5=3D80 bits of mantissa. 80-bit extended precision has only 64 bits, so you=
'll still need to do the reduction by several carefully measured steps. FMA=
, on the other hand, when used smartly, gives you equivalent of intermediat=
e with 106-bit mantissa.


>=20
> >On somewhat related note, it seems to me that forward trigonometric=20
> >functions
> >with 'Cycle' argument are underspecified. I'd like the requirements for=
=20
> >extremely
> >useful special cases of 'Cycle' =3D=3D exact power of 2 to be more prono=
unced.
>=20
> It's certainly possible. We'd welcome someone with substantial numeric=20
> experience contributing to the ARG for Ada Standards maintenance. Most of=
 us=20
> know enough to understand the Ada numeric model and when something is=20
> incorrect for it, but we don't have anyone very good at the subtle detail=
s.
>=20
>                                            Randy.

I am certainly not enough of an expert.
All I can say is that when Cycle=3D=3D1 then Sin(x, 1) becomes an equivalen=
t of IEEE-754 sinPi(x), so, may be, you can copy IEEE requirements for sinP=
i() if not in general case than, at least for IEEE-754 based platforms.
On more general note, asking IEEE-754 committee for help sounds like a good=
 idea.
0
already5chosen
12/21/2016 9:29:20 AM
Reply: