eula.txt0000644€­ Q00042560000000302514616534611012455 0ustar aakkasintelallCopyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. EXAMPLES/0000755€­ Q00042560000000000014616534611012204 5ustar aakkasintelallEXAMPLES/README0000644€­ Q00042560000000404414616534611013066 0ustar aakkasintelallNote: 000, 001, ..., 111 are associated with the following three conditions: bit 2 [msb]: 0 = call by value (except for the pointer to the status flags, passed by reference unless global) 1 = call by reference; bit 1 : 0 = rounding mode passed as a parameter 1 = rounding mode passed in global variable _IDEC_glbround (fixed name) bit 0 [lsb]: 0 = pointer to status flags passed as a parameter 1 = status flags passed in global variable _IDEC_glbflags (fixed name) Example (one of eight possible, for Linux only; similar for other OS-es): Build libbid.a in ../LIBRARY with '...CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0' $ cp main.c_000 main.c $ cp decimal.h_000 decimal.h $ icc main.c ../LIBRARY/libbid.a $ ./a.out Begin Decimal Floating-Point Sanity Check TEST CASE 1 FOR bid128_mul 000 () PASSED TEST CASE 2 FOR bid128_mul 000 () PASSED TEST CASE 3 FOR bid128_mul 000 () PASSED End Decimal Floating-Point Sanity Check $ rm main.c decimal.h a.out Note: The scripts and makefiles provided here may need adjustments, depending on the environment in which they are used; for example if moving files from Windows to Linux, running dos2unix on the Linux script files may be necessary. Note: For some other operating systems and architecture combinations see the following command files, as well as any command files invoked from these ones: RUNWINDOWS_nmake.bat RUNOSXINTEL64 These command files build and run all eight examples from this directory, possibly using more than one compiler. Changes may be needed for certain environments. However, prior to building these examples the similar RUN* command has to be executed in ../LIBRARY/ in order to build all the necessary versions of the Intel(R) Decimal Floating-Point Math Library V2.3 (Version 2, Update 3). The tests [when built correctly] pass if the word FAIL does not appear in the output. * Other names and brands may be claimed as the property of others. EXAMPLES/RUNLINUXINTEL64_GCC0000755€­ Q00042560000000045614616534611015045 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN LINUX..." rm linuxout a.out ./linuxbuild_gcc > linuxout # grep PASS linuxout cat linuxout grep FAIL linuxout rm linuxout main.c decimal.h echo "END BUILDING AND RUNNING EXAMPLES IN LINUX..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/windowsbuild_cl.bat0000755€­ Q00042560000001100314616534611016062 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 000 **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\cl000libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 001 **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\cl001libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 010 **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\cl010libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 011 **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\cl011libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 100 **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\cl100libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 101 **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\cl101libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 110 **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\cl110libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 111 **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\cl111libbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 000b **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\cl000blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 001b **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\cl001blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 010b **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\cl010blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 011b **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\cl011blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 100b **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\cl100blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 101b **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\cl101blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 110b **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\cl110blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR cl 111b **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\cl111blibbid.lib ..\LIBRARY\libbid.lib cl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib EXAMPLES/windowsbuild_clang.bat0000755€­ Q00042560000001126314616534611016560 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000 **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\clang000libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001 **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\clang001libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010 **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\clang010libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011 **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\clang011libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100 **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\clang100libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101 **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\clang101libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110 **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\clang110libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111 **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\clang111libbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000b **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\clang000blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001b **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\clang001blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010b **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\clang010blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011b **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\clang011blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100b **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\clang100blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101b **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\clang101blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110b **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\clang110blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111b **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\clang111blibbid.lib ..\LIBRARY\libbid.lib clang -o main.exe main.c ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib EXAMPLES/windowsbuild_icl.bat0000755€­ Q00042560000001106314616534611016241 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 000 **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\icl000libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 001 **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\icl001libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 010 **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\icl010libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 011 **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\icl011libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 100 **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\icl100libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 101 **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\icl101libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 110 **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\icl110libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 111 **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\icl111libbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 000b **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\icl000blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 001b **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\icl001blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 010b **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\icl010blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 011b **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\icl011blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 100b **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\icl100blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 101b **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\icl101blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 110b **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\icl110blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icl 111b **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\icl111blibbid.lib ..\LIBRARY\libbid.lib icl main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib EXAMPLES/windowsbuild_icx.bat0000755€­ Q00042560000001106314616534611016255 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 000 **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\icx000libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 001 **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\icx001libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 010 **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\icx010libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 011 **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\icx011libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 100 **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\icx100libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 101 **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\icx101libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 110 **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\icx110libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 111 **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\icx111libbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 000b **************************" echo "" echo "" copy /Y main.c_000 main.c copy /Y decimal.h_000 decimal.h copy /Y ..\LIBRARY\icx000blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 001b **************************" echo "" echo "" copy /Y main.c_001 main.c copy /Y decimal.h_001 decimal.h copy /Y ..\LIBRARY\icx001blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 010b **************************" echo "" echo "" copy /Y main.c_010 main.c copy /Y decimal.h_010 decimal.h copy /Y ..\LIBRARY\icx010blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 011b **************************" echo "" echo "" copy /Y main.c_011 main.c copy /Y decimal.h_011 decimal.h copy /Y ..\LIBRARY\icx011blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 100b **************************" echo "" echo "" copy /Y main.c_100 main.c copy /Y decimal.h_100 decimal.h copy /Y ..\LIBRARY\icx100blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 101b **************************" echo "" echo "" copy /Y main.c_101 main.c copy /Y decimal.h_101 decimal.h copy /Y ..\LIBRARY\icx101blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 110b **************************" echo "" echo "" copy /Y main.c_110 main.c copy /Y decimal.h_110 decimal.h copy /Y ..\LIBRARY\icx110blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 111b **************************" echo "" echo "" copy /Y main.c_111 main.c copy /Y decimal.h_111 decimal.h copy /Y ..\LIBRARY\icx111blibbid.lib ..\LIBRARY\libbid.lib icx main.c /DWINDOWS ..\LIBRARY\libbid.lib %1 main.exe del main.exe main.c decimal.h del ..\LIBRARY\libbid.lib EXAMPLES/decimal.h_0000000644€­ Q00042560000000624314616534611014337 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 000: // 0 arguments passed by value (except fpsf) // 0 rounding mode passed as argument // 0 pointer to status flags passed as argument #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern Decimal128 __bid128_mul ( Decimal128, Decimal128, _IDEC_round, _IDEC_flags * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/decimal.h_0010000644€­ Q00042560000000670214616534611014340 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 001: // 0 arguments passed by value (except fpsf) // 0 rounding mode passed as argument // 1 status flags in global variable #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern BID_THREAD _IDEC_flags __bid_IDEC_glbflags; extern Decimal128 __bid128_mul ( Decimal128, Decimal128, _IDEC_round ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/decimal.h_0100000644€­ Q00042560000000672414616534611014344 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 010: // 0 arguments passed by value (except fpsf) // 1 rounding mode passed in global variable // 0 pointer to status flags passed as argument #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; extern BID_THREAD _IDEC_round __bid_IDEC_glbround; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern Decimal128 __bid128_mul ( Decimal128, Decimal128, _IDEC_flags * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/decimal.h_0110000644€­ Q00042560000000675114616534611014345 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 011: // 0 arguments passed by value (except fpsf) // 1 rounding mode passed in global variable // 1 status flags in global variable #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; extern BID_THREAD _IDEC_round __bid_IDEC_glbround; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern BID_THREAD _IDEC_flags __bid_IDEC_glbflags; extern Decimal128 __bid128_mul ( Decimal128, Decimal128 ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/RUNLINUXINTEL64_CLANG0000755€­ Q00042560000000046014616534611015270 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN LINUX..." rm linuxout a.out ./linuxbuild_clang > linuxout # grep PASS linuxout cat linuxout grep FAIL linuxout rm linuxout main.c decimal.h echo "END BUILDING AND RUNNING EXAMPLES IN LINUX..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/RUNLINUXINTEL64_ICC0000755€­ Q00042560000000045614616534611015047 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN LINUX..." rm linuxout a.out ./linuxbuild_icc > linuxout # grep PASS linuxout cat linuxout grep FAIL linuxout rm linuxout main.c decimal.h echo "END BUILDING AND RUNNING EXAMPLES IN LINUX..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/RUNLINUXINTEL64_ICX0000755€­ Q00042560000000045614616534611015074 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN LINUX..." rm linuxout a.out ./linuxbuild_icx > linuxout # grep PASS linuxout cat linuxout grep FAIL linuxout rm linuxout main.c decimal.h echo "END BUILDING AND RUNNING EXAMPLES IN LINUX..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/RUNWINDOWSINTEL64_CLANG.bat0000755€­ Q00042560000000031514616534611016267 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN WINDOWS..." call windowsbuild_clang.bat echo "END BUILDING AND RUNNING EXAMPLES IN WINDOWS..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/RUNWINDOWSINTEL64_ICL.bat0000755€­ Q00042560000000031314616534611016050 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN WINDOWS..." call windowsbuild_icl.bat echo "END BUILDING AND RUNNING EXAMPLES IN WINDOWS..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/RUNWINDOWSINTEL64_ICX.bat0000755€­ Q00042560000000031314616534611016064 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN WINDOWS..." call windowsbuild_icx.bat echo "END BUILDING AND RUNNING EXAMPLES IN WINDOWS..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/decimal.h_1000000644€­ Q00042560000000625314616534611014341 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 100: // 1 arguments passed by reference // 0 rounding mode passed as argument // 0 pointer to status flags passed as argument #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern void __bid128_mul ( Decimal128 *, Decimal128 *, Decimal128 *, _IDEC_round *, _IDEC_flags * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/linuxbuild_gcc0000755€­ Q00042560000001024014616534611015122 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 000 **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/gcc000libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 001 **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/gcc001libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 010 **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/gcc010libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 011 **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/gcc011libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 100 **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/gcc100libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 101 **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/gcc101libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 110 **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/gcc110libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 111 **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/gcc111libbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 000b **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/gcc000blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 001b **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/gcc001blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 010b **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/gcc010blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 011b **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/gcc011blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 100b **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/gcc100blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 101b **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/gcc101blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 110b **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/gcc110blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR gcc 111b **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/gcc111blibbid.a ../LIBRARY/libbid.a gcc $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out rm ../LIBRARY/libbid.a EXAMPLES/linuxbuild_icc0000755€­ Q00042560000001020014616534611015120 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 000 **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/icc000libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 001 **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/icc001libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 010 **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/icc010libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 011 **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/icc011libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 100 **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/icc100libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 101 **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/icc101libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 110 **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/icc110libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 111 **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/icc111libbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 000b **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/icc000blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 001b **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/icc001blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 010b **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/icc010blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 011b **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/icc011blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 100b **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/icc100blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 101b **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/icc101blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 110b **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/icc110blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icc 111b **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/icc111blibbid.a ../LIBRARY/libbid.a icc $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a EXAMPLES/RUNWINDOWSINTEL64_CL.bat0000755€­ Q00042560000000031214616534611015736 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN WINDOWS..." call windowsbuild_cl.bat echo "END BUILDING AND RUNNING EXAMPLES IN WINDOWS..." echo "THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE" EXAMPLES/linuxbuild_icx0000755€­ Q00042560000001020014616534611015145 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 000 **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/icx000libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 001 **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/icx001libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 010 **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/icx010libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 011 **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/icx011libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 100 **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/icx100libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 101 **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/icx101libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 110 **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/icx110libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 111 **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/icx111libbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 000b **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/icx000blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 001b **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/icx001blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 010b **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/icx010blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 011b **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/icx011blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 100b **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/icx100blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 101b **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/icx101blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 110b **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/icx110blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR icx 111b **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/icx111blibbid.a ../LIBRARY/libbid.a icx $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a EXAMPLES/linuxbuild_clang0000755€­ Q00042560000001040014616534611015450 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000 **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/clang000libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001 **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/clang001libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010 **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/clang010libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011 **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/clang011libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100 **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/clang100libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101 **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/clang101libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110 **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/clang110libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111 **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/clang111libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000b **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/clang000blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001b **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/clang001blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010b **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/clang010blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011b **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/clang011blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100b **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/clang100blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101b **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/clang101blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110b **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/clang110blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111b **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/clang111blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a -lm ./a.out rm a.out rm ../LIBRARY/libbid.a EXAMPLES/macbuild0000755€­ Q00042560000001031214616534611013707 0ustar aakkasintelallecho "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000 **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/clang000libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001 **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/clang001libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010 **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/clang010libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011 **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/clang011libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100 **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/clang100libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101 **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/clang101libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110 **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/clang110libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111 **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/clang111libbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 000b **************************" echo "" echo "" cp main.c_000 main.c cp decimal.h_000 decimal.h cp ../LIBRARY/clang000blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 001b **************************" echo "" echo "" cp main.c_001 main.c cp decimal.h_001 decimal.h cp ../LIBRARY/clang001blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 010b **************************" echo "" echo "" cp main.c_010 main.c cp decimal.h_010 decimal.h cp ../LIBRARY/clang010blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 011b **************************" echo "" echo "" cp main.c_011 main.c cp decimal.h_011 decimal.h cp ../LIBRARY/clang011blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 100b **************************" echo "" echo "" cp main.c_100 main.c cp decimal.h_100 decimal.h cp ../LIBRARY/clang100blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 101b **************************" echo "" echo "" cp main.c_101 main.c cp decimal.h_101 decimal.h cp ../LIBRARY/clang101blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 110b **************************" echo "" echo "" cp main.c_110 main.c cp decimal.h_110 decimal.h cp ../LIBRARY/clang110blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out echo "" echo "" echo "***************** RUNNING EXAMPLE FOR clang 111b **************************" echo "" echo "" cp main.c_111 main.c cp decimal.h_111 decimal.h cp ../LIBRARY/clang111blibbid.a ../LIBRARY/libbid.a clang $1 main.c ../LIBRARY/libbid.a ./a.out rm a.out rm ../LIBRARY/libbid.a EXAMPLES/decimal.h_1010000644€­ Q00042560000000671414616534611014344 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 101: // 1 arguments passed by reference // 0 rounding mode passed as argument // 1 status flags in global variable #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern BID_THREAD _IDEC_flags __bid_IDEC_glbflags; extern void __bid128_mul ( Decimal128 *, Decimal128 *, Decimal128 *, _IDEC_round * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/RUNOSXINTEL640000755€­ Q00042560000000044314616534611014137 0ustar aakkasintelallecho "BEGIN BUILDING AND RUNNING EXAMPLES IN LINUX..." rm a.out macout ./macbuild -m64 > macout # grep PASS macout cat macout grep FAIL macout rm macout main.c decimal.h echo "END BUILDING AND RUNNING EXAMPLES IN LINUX..." echo "(THE TESTS PASSED IF THE WORD 'FAIL' WAS NOT PRINTED ABOVE)" EXAMPLES/decimal.h_1100000644€­ Q00042560000000673314616534611014345 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 110: // 1 arguments passed by reference // 1 rounding mode passed in global variable // 0 pointer to status flags passed as argument #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; extern BID_THREAD _IDEC_round __bid_IDEC_glbround; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern void __bid128_mul ( Decimal128 *, Decimal128 *, Decimal128 *, _IDEC_flags * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/decimal.h_1110000644€­ Q00042560000000675714616534611014354 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 111: // 1 arguments passed by reference // 1 rounding mode passed in global variable // 1 status flags in global variable #ifdef WINDOWS #define LX "%I64x" #else #ifdef HPUX_OS #define LX "%llx" #else #define LX "%Lx" #endif #endif #ifndef BID_THREAD #if defined (_MSC_VER) //Windows #define BID_THREAD __declspec(thread) #else #if !defined(__APPLE__) //Linux, FreeBSD #define BID_THREAD __thread #else //Mac OSX, TBD #define BID_THREAD #endif //Linux or Mac #endif //Windows #endif //BID_THREAD /* basic decimal floating-point types */ #if defined _MSC_VER #if defined _M_IX86 && !defined __INTEL_COMPILER // Win IA-32, MS compiler #define ALIGN(n) #else #define ALIGN(n) __declspec(align(n)) #endif #else #define ALIGN(n) __attribute__ ((aligned(n))) #endif typedef unsigned int Decimal32; typedef unsigned long long Decimal64; typedef struct ALIGN(16) { unsigned long long w[2]; } Decimal128; /* rounding modes */ typedef enum _IDEC_roundingmode { _IDEC_nearesteven = 0, _IDEC_downward = 1, _IDEC_upward = 2, _IDEC_towardzero = 3, _IDEC_nearestaway = 4, _IDEC_dflround = _IDEC_nearesteven } _IDEC_roundingmode; typedef unsigned int _IDEC_round; extern BID_THREAD _IDEC_round __bid_IDEC_glbround; /* exception flags */ typedef enum _IDEC_flagbits { _IDEC_invalid = 0x01, _IDEC_zerodivide = 0x04, _IDEC_overflow = 0x08, _IDEC_underflow = 0x10, _IDEC_inexact = 0x20, _IDEC_allflagsclear = 0x00 } _IDEC_flagbits; typedef unsigned int _IDEC_flags; // could be a struct with diagnostic info extern BID_THREAD _IDEC_flags __bid_IDEC_glbflags; extern void __bid128_mul ( Decimal128 *, Decimal128 *, Decimal128 * ); #if BID_BIG_ENDIAN #define HIGH_128W 0 #define LOW_128W 1 #else #define HIGH_128W 1 #define LOW_128W 0 #endif EXAMPLES/main.c_0000000644€­ Q00042560000001240014616534611013650 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 000: // 0 arguments passed by value (except fpsf) // 0 rounding mode passed as argument // 0 pointer to status flags passed as argument #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_round my_rnd_mode = _IDEC_dflround; _IDEC_flags my_fpsf = _IDEC_allflagsclear; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 my_rnd_mode = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 z = __bid128_mul (x, y, my_rnd_mode, &my_fpsf); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || my_fpsf != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=3040000000000000 0000000000000006 my_fpsf=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 000 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 000 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, my_rnd_mode, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 000 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 000 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_towardzero; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, my_rnd_mode, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 000 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 000 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_0010000644€­ Q00042560000001263014616534611013656 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 001: // 0 arguments passed by value (except fpsf) // 0 rounding mode passed as argument // 1 status flags in global variable #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_round my_rnd_mode = _IDEC_dflround; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 my_rnd_mode = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 z = __bid128_mul (x, y, my_rnd_mode); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || __bid_IDEC_glbflags != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=3040000000000000 0000000000000006 " "__bid_IDEC_glbflags=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 001 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 001 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, my_rnd_mode); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 001 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 001 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_towardzero; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, my_rnd_mode); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 001 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 001 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_0100000644€­ Q00042560000001231614616534611013657 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 010: // 0 arguments passed by value (except fpsf) // 1 rounding mode passed in a global variable // 0 pointer to status flags passed as argument #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_flags my_fpsf = _IDEC_allflagsclear; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 __bid_IDEC_glbround = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 z = __bid128_mul (x, y, &my_fpsf); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || my_fpsf != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=3040000000000000 0000000000000006 my_fpsf=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 010 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 010 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 010 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 010 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_towardzero; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 010 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 010 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_0110000644€­ Q00042560000001254414616534611013663 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 011: // 0 arguments passed by value (except fpsf) // 1 rounding mode passed in global variable // 1 status flags in global variable #include #include #include "decimal.h" int main () { Decimal128 x, y, z; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 __bid_IDEC_glbround = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 z = __bid128_mul (x, y); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || __bid_IDEC_glbflags != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=3040000000000000 0000000000000006 " "__bid_IDEC_glbflags=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 011 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 011 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 011 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 011 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_towardzero; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits z = __bid128_mul (x, y); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 011 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 011 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_1000000644€­ Q00042560000001237714616534611013666 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 100: // 1 arguments passed by reference // 0 rounding mode passed as argument // 0 pointer to status flags passed as argument #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_round my_rnd_mode = _IDEC_dflround; _IDEC_flags my_fpsf = _IDEC_allflagsclear; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 my_rnd_mode = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 __bid128_mul (&z, &x, &y, &my_rnd_mode, &my_fpsf); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || my_fpsf != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=3040000000000000 0000000000000006 my_fpsf=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 100 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 100 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_rnd_mode, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 100 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 100 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_towardzero; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_rnd_mode, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 100 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 100 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_1010000644€­ Q00042560000001263514616534611013664 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 101: // 1 arguments passed by value reference // 0 rounding mode passed as argument // 1 status flags in global variable #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_round my_rnd_mode = _IDEC_dflround; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 my_rnd_mode = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 __bid128_mul (&z, &x, &y, &my_rnd_mode); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || __bid_IDEC_glbflags != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=3040000000000000 0000000000000006 " "__bid_IDEC_glbflags=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 101 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 101 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_rnd_mode); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 101 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 101 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint my_rnd_mode = _IDEC_towardzero; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_rnd_mode); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 101 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 101 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_1100000644€­ Q00042560000001231214616534611013654 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 110: // 1 arguments passed by reference // 1 rounding mode passed in a global variable // 0 pointer to status flags passed as argument #include #include #include "decimal.h" int main () { Decimal128 x, y, z; _IDEC_flags my_fpsf = _IDEC_allflagsclear; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 __bid_IDEC_glbround = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 __bid128_mul (&z, &x, &y, &my_fpsf); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || my_fpsf != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=3040000000000000 0000000000000006 my_fpsf=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 110 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 110 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_nearesteven; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 110 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 110 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_towardzero; my_fpsf = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y, &my_fpsf); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || my_fpsf != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" my_fpsf=%x\n", z.w[HIGH_128W], z.w[LOW_128W], my_fpsf); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 my_fpsf=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 110 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 110 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } EXAMPLES/main.c_1110000644€­ Q00042560000001254014616534611013660 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ // 111: // 1 arguments passed by reference // 1 rounding mode passed in global variable // 1 status flags in global variable #include #include #include "decimal.h" int main () { Decimal128 x, y, z; printf ("Begin Decimal Floating-Point Sanity Check\n"); // 2 * 3 = 6 __bid_IDEC_glbround = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x3040000000000000ull; x.w[LOW_128W] = 0x0000000000000002ull; // x = 2 y.w[HIGH_128W] = 0x3040000000000000ull; y.w[LOW_128W] = 0x0000000000000003ull; // y = 3 __bid128_mul (&z, &x, &y); if (z.w[HIGH_128W] != 0x3040000000000000ull || z.w[LOW_128W] != 0x0000000000000006ull || __bid_IDEC_glbflags != _IDEC_allflagsclear) { printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=3040000000000000 0000000000000006 " "__bid_IDEC_glbflags=00000000\n"); printf ("ERROR: TEST CASE 1 FOR __bid128_mul 111 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 1 FOR __bid128_mul 111 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_nearesteven; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000051 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec33ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000051 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec33 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 2 FOR __bid128_mul 111 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 2 FOR __bid128_mul 111 () PASSED\n"); } // (x * y)RN is inexact and > MidPoint __bid_IDEC_glbround = _IDEC_towardzero; __bid_IDEC_glbflags = _IDEC_allflagsclear; z.w[HIGH_128W] = 0xbaddbaddbaddbaddull; z.w[LOW_128W] = 0xbaddbaddbaddbaddull; x.w[HIGH_128W] = 0x310800000000021eull; x.w[LOW_128W] = 0x19e0c9bab235ede1ull; // x = 9999999999999999340001 * 10^100; q1 = 22 <- 128 bits y.w[HIGH_128W] = 0x310800000000d3c2ull; y.w[LOW_128W] = 0x1bcecced9c69132full; // y = 999999999999999923000111 * 10^100; q2 = 24 <- 128 bits __bid128_mul (&z, &x, &y); // 9999999999999999340001 * 10^100 * 999999999999999923000111 * 10^100 =(RN) // 9999999999999998570002110000000050 * 10^200 if (z.w[HIGH_128W] != 0x31e9ed09bead87c0ull || z.w[LOW_128W] != 0x23b52ee2d8fdec32ull || __bid_IDEC_glbflags != _IDEC_inexact) { // 9999999999999998570002110000000050 * 10^212, inexact printf ("RECEIVED z="LX" "LX" __bid_IDEC_glbflags=%x\n", z.w[HIGH_128W], z.w[LOW_128W], __bid_IDEC_glbflags); printf ("EXPECTED z=31e9ed09bead87c0 23b52ee2d8fdec32 " "__bid_IDEC_glbflags=00000020\n"); printf ("ERROR: TEST CASE 3 FOR __bid128_mul 111 () FAILED\n\n"); exit (1); } else { printf ("TEST CASE 3 FOR __bid128_mul 111 () PASSED\n"); } printf ("End Decimal Floating-Point Sanity Check\n"); } LIBRARY/0000755€­ Q00042560000000000014616534611012072 5ustar aakkasintelallLIBRARY/macbuild0000755€­ Q00042560000000375114616534611013606 0ustar aakkasintelallrm -f *.o *.a make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang000libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang001libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang010libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang011libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang100libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang101libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang110libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang111libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang000blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang001blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang010blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang011blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang100blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang101blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang110blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang111blibbid.a make clean LIBRARY/RUNOSXINTEL640000755€­ Q00042560000000016314616534611014024 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN LINUX..." rm *.a ./macbuild COPT_ADD=-m64 echo "END BUILDING LIBRARY IN LINUX..." LIBRARY/makefile0000755€­ Q00042560000005020314616534611013575 0ustar aakkasintelall ############################################################################## # ============================================================================== # Copyright (c) 2007-2024, Intel Corp. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of Intel Corporation nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF # THE POSSIBILITY OF SUCH DAMAGE. # ============================================================================== # ############################################################################## # ============================================================================== # Makefile for math functions for the Intel(r) # Decimal Floating-Point Math Library HELP_TEXT := \ @\ @=======================================================================\ @\ @This makefile has the following standard (.PHONY) targets:\ @\ @ top The default target. Can be modified via the symbol\ @ TOP. The default value is \'lib\'\ @ lib Builds the bid library in LIB_DIR directory\ @ help Prints this message\ @ package Creates a .tar file of the interesting sources\ @ cleanLib Deletes all BID library object files\ @ realCleanLib Deletes all BID library object files and the library \ @ cleanBinary Deletes all binary trancendental support object files\ @ clean Implies realCleanLib, cleanBinary \ @\ @\ @Useful command line symbols \ @\ @ USE_COMPILER_F128_TYPE 'true' will use the compiler intrinsic\ @ 128-bit floating point type. Otherwise\ @ use an internal 128-bit floating point\ @ emulation. Default is 'true'\ @ USE_COMPILER_F128_TYPE 'true' will use the compiler intrinsic\ @ 80-bit floating point type. Otherwise\ @ use the selected 128-bit choice. The\ @ default is 'true'\ @ IML_MAKEFILE_PRE Path to local makefile definitions\ @ is \$$(TSRC_DIR)/readtest.known_errors\ @ Others to follow\ @=======================================================================\ # ============================================================================== # Define the default directory structure # ============================================================================== BID_SRC_ROOT ?= . SRC_DIR ?= $(BID_SRC_ROOT)/src TSRC_DIR ?= $(BID_SRC_ROOT)/tests F128_DIR ?= $(BID_SRC_ROOT)/float128 OBJ_DIR ?= $(BID_SRC_ROOT) GEN_DIR ?= $(BID_SRC_ROOT) TOBJ_DIR ?= $(BID_SRC_ROOT) EXE_DIR ?= $(OBJ_DIR) LIB_DIR ?= $(OBJ_DIR) RES_DIR ?= $(BID_SRC_ROOT) include makefile.iml_head BID_SRC_ROOT := $(BID_SRC_ROOT) SRC_DIR := $(SRC_DIR) TSRC_DIR := $(TSRC_DIR) F128_DIR := $(F128_DIR) OBJ_DIR := $(OBJ_DIR) GEN_DIR := $(GEN_DIR) TOBJ_DIR := $(TOBJ_DIR) EXE_DIR := $(EXE_DIR) LIB_DIR := $(LIB_DIR) RES_DIR := $(RES_DIR) # ============================================================================= # Cancel implict rules # ============================================================================= % : %.o %.o : %.c # ============================================================================= # Set up the default compilation and preprocessing flags. # ============================================================================= _CFLAGS_INC := -I$(SRC_DIR) _CFLAGS_CONFIG := _CFLAGS_OS := $(call HostOsTypeSelect, -DLINUX, -DWINDOWS) _CFLAGS_ARCH := $(call HostArchTypeSelect,-Dia32,-DITANIUM -Dia64, -Defi2) _CFLAGS_CC := _CFLAGS_OPT := ifeq ($(BID_BIG_ENDIAN),true) _CFLAGS_CONFIG += -DBID_BIG_ENDIAN=1 endif ifeq ($(IS_INTEL_CC),true) ifeq ($(CC_NAME),icl) _CFLAGS_CC += /Qlong-double /Qpc80 /Qstd=c99 endif ifeq ($(IML_HOST_OS_TYPE),WINNT) ifeq ($(CC_NAME),icx) _CFLAGS_CC += /Qlong-double /Qpc80 /Qstd=c99 endif ifeq ($(IML_HOST_OS_TYPE),LINUX) _CFLAGS_CC += -mlong-double-80 -pc80 -std=c99 endif endif endif ifeq ($(IS_INTEL_CC),true) _USE_COMPILER_F128_TYPE := true _USE_COMPILER_F80_TYPE := true else _USE_COMPILER_F80_TYPE := false _USE_COMPILER_F128_TYPE := false endif USE_COMPILER_F128_TYPE ?= $(_USE_COMPILER_F128_TYPE) USE_COMPILER_F80_TYPE ?= $(_USE_COMPILER_F80_TYPE) ifneq ($(USE_COMPILER_F128_TYPE),true) _CFLAGS_CONFIG += -DUSE_COMPILER_F128_TYPE=0 else _CFLAGS_CONFIG += -DUSE_COMPILER_F128_TYPE=1 ifeq ($(IS_INTEL_CC),true) _CFLAGS_CC += -Qoption,cpp,--extended_float_types endif endif ifneq ($(strip $(USE_COMPILER_F80_TYPE)),true) _CFLAGS_CONFIG += -DUSE_COMPILER_F80_TYPE=0 else _CFLAGS_CONFIG += -DUSE_COMPILER_F80_TYPE=1 endif # ============================================================================= # Assemble all of the CFLAG parts and override values # ============================================================================= CFLAGS_AUX ?= $(_CFLAGS_AUX) CFLAGS_OPT ?= $(_CFLAGS_OPT) CFLAGS_CC ?= $(_CFLAGS_CC) CFLAGS_ARCH ?= $(_CFLAGS_ARCH) CFLAGS_OS ?= $(_CFLAGS_OS) CFLAGS_INC ?= $(_CFLAGS_INC) CFLAGS_CONFIG ?= $(_CFLAGS_CONFIG) CFLAGS ?= $(foreach n,INC CONFIG OS ARCH CC OPT AUX,$(CFLAGS_$n)) # get rid of extra blank characters CFLAGS := $(foreach n,$(CFLAGS),$n) #======================================================================== # Added BID build options (used for open source release) #======================================================================== ifneq ($(DFP_WRAP),1) override DFP_WRAP := 0 endif ifneq ($(CALL_BY_REF),1) override CALL_BY_REF := 0 else override DFP_WRAP := 0 endif ifneq ($(GLOBAL_RND),1) override GLOBAL_RND := 0 override DFP_WRAP := 0 endif ifneq ($(GLOBAL_FLAGS),1) override GLOBAL_FLAGS := 0 override DFP_WRAP := 0 endif BID_BLD_FLAGS := -DDECIMAL_CALL_BY_REFERENCE=$(CALL_BY_REF) \ -DDECIMAL_GLOBAL_ROUNDING=$(GLOBAL_RND) \ -DDECIMAL_GLOBAL_EXCEPTION_FLAGS=$(GLOBAL_FLAGS) ifeq ($(UNCHANGED_BINARY_FLAGS),1) BID_BLD_FLAGS += -DUNCHANGED_BINARY_STATUS_FLAGS endif ifeq ($(IS_INTEL_CC),true) ifeq ($(DFP_WRAP),1) BID_BLD_FLAGS += -D__DFP_WRAPPERS_ON=1 endif endif NO_BINARY80 := 0 ifeq ($(NO_BINARY80),1) BID_BLD_FLAGS += -D__NO_BINARY80__ endif # ============================================================================= # Set up default target # ============================================================================= TOP ?= lib .PHONY : top top : $(TOP) .PHONY : default # ============================================================================= # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ============================================================================= # Targets for building the BID library # ============================================================================= # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ============================================================================= # ============================================================================= # The BID transcendental functions assume the existence of a corresponding # binary transcendental function. If such a function is not available on the # target system, then we need to provide it as part of the the BID library. # If the binary functions are missing, it is usually the 128-bit quad precision # routines. Below, we create the list of quad functions that are required by # the BID package # ============================================================================= F128_NAMES := $(addprefix dpml_ux_, bid bessel cbrt erf exp int inv_hyper \ inv_trig lgamma log mod powi pow sqrt \ trig ops ops_64 ) \ dpml_four_over_pi dpml_exception sqrt_tab_t F128_OBJS := $(call PrefixSuffix, $(OBJ_DIR)/, $(F128_NAMES), .$O ) # ============================================================================= # For some systems, some of the double precision binary transcendentals are # missing. Again, here we create the list of missing functions # ============================================================================= ifeq ($(IML_HOST_OS)_$(CC_NAME),WINNT_cl) F53_NAMES := $(call PrefixSuffix,dpml_, asinh acosh cbrt erf erfc expm1 \ exp10 exp2 lgamma log1p tgamma rt_lgamma,_t) \ dpml_pow_t_table dpml_cbrt_t_table \ dpml_special_exp_t ifeq ($(IML_HOST_ARCH),IA32) F53_NAMES += dpml_log2_t endif F53_OBJS := $(call PrefixSuffix,$(OBJ_DIR)/, $(F53_NAMES), .$O) endif # ============================================================================= # Define the contents of the library # # BID_COMMON_LIBM Transcendental function routines that are supported in # 32, 64 and 128 bit forms and include bid_trans.h # rather than bid_internal.h # BID_COMMON_OPS Decimal operation routines that are supported in # 32, 64 and 128 bit forms and include bid_internal.h # COMMON Decimal operation routines that are supported in # 32, 64 and 128 bit forms but don't have a 'bid' # prefix # BID_ Routines that are supported in only bit forms # BID Generic bid routines not specifically related to # data type length # BID_MISC Required files that don't begin with the 'bid' # prefix # # ============================================================================= BID_COMMON_LIBM := \ acos acosh asin asinh atan atan2 atanh cbrt cos cosh erf erfc exp \ exp10 exp2 expm1 hypot lgamma log log10 log1p log2 pow sin sinh tan \ tanh tgamma BID_COMMON_OPS := \ add compare div fdimd fma fmod frexp ldexp llrintd logb logbd lrintd \ lround minmax modf mul nearbyintd next nexttowardd noncomp quantexpd \ quantize rem round_integral scalb scalbl sqrt string to_int16 \ to_int32 to_int64 to_int8 to_uint16 to_uint32 to_uint64 to_uint8 \ llround llquantexpd quantumd COMMON := strtod wcstod BID_32 := sub to_bid128 to_bid64 BID_64 := to_bid128 BID_128 := 2_str_tables BID := \ binarydecimal convert_data decimal_data decimal_globals dpd \ feclearexcept fegetexceptflag feraiseexcept fesetexceptflag \ fetestexcept flag_operations from_int round BID_MISC := bid128 BID_TRANS_OBJS := \ $(call CrossCat5, $(OBJ_DIR)/bid,64 128,_,$(BID_COMMON_LIBM),.$O) BID_INTERNAL_OBJS := \ $(call CrossCat5, $(OBJ_DIR)/bid,32 64 128,_,$(BID_COMMON_OPS),.$O) \ $(call CrossCat4, $(OBJ_DIR)/,$(COMMON),32 64 128,.$O) \ $(call CrossCat3, $(OBJ_DIR)/bid32_, $(BID_COMMON_LIBM),.$O) \ $(call CrossCat5, $(OBJ_DIR)/bid32_, $(BID_32),.$O) \ $(call CrossCat3, $(OBJ_DIR)/bid64_, $(BID_64),.$O) \ $(call CrossCat3, $(OBJ_DIR)/bid128_,$(BID_128),.$O) \ $(call CrossCat3, $(OBJ_DIR)/bid_, $(BID),.$O) \ $(call CrossCat3, $(OBJ_DIR)/, $(BID_MISC),.$O) ALL_BID_OBJS := $(BID_TRANS_OBJS) $(BID_INTERNAL_OBJS) ifneq ($(strip $(USE_COMPILER_F128_TYPE)),true) # ========================================================================== # Include the necessary binary transcendental routines that are not # available on the target system # ========================================================================== ALL_BID_OBJS := $(ALL_BID_OBJS) $(F128_OBJS) $(F53_OBJS) endif $(ALL_BID_OBJS) :: $(OBJ_DIR)/.directory_exists $(OBJ_DIR)/bid_b2d.$O :: $(SRC_DIR)/bid_b2d.h $(OBJ_DIR)/strtod32.$O :: $(SRC_DIR)/bid_strtod.h $(OBJ_DIR)/bid64_fma.$O :: $(SRC_DIR)/bid_inline_add.h $(OBJ_DIR)/bid32_string.$O :: $(SRC_DIR)/bid128_2_str_macros.h $(OBJ_DIR)/bid32_string.$O :: $(SRC_DIR)/bid128_2_str.h $(OBJ_DIR)/bid32_sqrt.$O :: $(SRC_DIR)/bid_sqrt_macros.h $(OBJ_DIR)/bid32_div.$O :: $(SRC_DIR)/bid_div_macros.h $(BID_TRANS_OBJS) :: $(OBJ_DIR)/%.$O : $(SRC_DIR)/%.c $(SRC_DIR)/bid_trans.h $(CC) -c $(FO)$@ $(CFLAGS) $(BID_BLD_FLAGS) $< $(BID_INTERNAL_OBJS) :: $(OBJ_DIR)/%.$O : $(SRC_DIR)/%.c \ $(SRC_DIR)/bid_internal.h $(CC) -c $(FO)$@ $(CFLAGS) $(BID_BLD_FLAGS) $< $(SRC_DIR)/bid_trans.h : $(SRC_DIR)/bid_internal.h touch $@ $(SRC_DIR)/bid_internal.h : $(SRC_DIR)/bid_conf.h $(SRC_DIR)/bid_functions.h touch $@ ifeq ($(CC),gcc) $(SRC_DIR)/bid_functions.h : $(SRC_DIR)/bid_gcc_intrinsics.h touch $@ endif BID_LIB = $(LIB_DIR)/libbid.$A lib : $(BID_LIB) $(BID_LIB) :: $(LIB_DIR)/.directory_exists $(BID_LIB) :: $(ALL_BID_OBJS) $(AR_CMD) $(AR_OUT)$@ $^ .PHONY : cleanLib realCleanLib cleanLib : $(RM) $(ALL_BID_OBJS) realCleanLib : cleanLib $(RM) $(BID_LIB) # ============================================================================= # Targets for the non-native binary transcendental functions. # # Most of the 128-bit functions have a simple build rule: # # dpml__x.o : dpml_ux_.c dpml__x.h # $(CC) ... $< # # Files that fit this rule are included in the F128_HDR_xxx lists. Other files # are handled individually # ============================================================================= F128_PLATFORM_FLAGS := $(foreach n, IML_HOST_ARCH IML_HOST_OS CC_NAME \ ,-D$(call ToLower,$($n))) F128_CFLAGS := $(CFLAGS_OPT) -DUSE_NATIVE_QUAD_TYPE=0 $(F128_PLATFORM_FLAGS) F128_HDR_NAMES := bessel cons int lgamma powi sqrt bid erf inv_hyper \ log pow trig cbrt exp inv_trig mod F128_HDR_OBJS := $(call PrefixSuffix, $(OBJ_DIR)/dpml_ux_,$(F128_HDR_NAMES),.$O) $(F128_DIR)/dpml_ux.h : $(F128_DIR)/dpml_private.h \ $(F128_DIR)/dpml_ux_32_64.h \ $(F128_DIR)/dpml_cons_x.h touch $@ $(F128_DIR)/dpml_private.h : $(call PrefixSuffix, $(F128_DIR)/, \ build op_system compiler architecture i_format \ f_format mtc_macros mphoc_macros poly_macros \ assert dpml_names dpml_exception ix86_macros, .h) touch $@ $(OBJ_DIR)/dpml_globals.$O :: $(call PrefixSuffix, $(F128_DIR)/, \ build op_system compiler architecture f_format \ dpml_names mphoc_macros, .h) $(OBJ_DIR)/dpml_error_codes.$O :: $(call PrefixSuffix, $(F128_DIR)/, \ dpml_error_codes_enum dpml_function_info, .h) $(OBJ_DIR)/dpml_exception.$O :: $(call PrefixSuffix, $(F128_DIR)/, \ dpml_error_codes, .h) $(F128_HDR_OBJS) :: $(OBJ_DIR)/dpml_ux_%.$O : $(F128_DIR)/dpml_%_x.h $(F128_OBJS) :: $(OBJ_DIR)/%.$O : $(F128_DIR)/%.c $(F128_DIR)/dpml_ux.h $(CC) -c $(FO)$@ $(F128_CFLAGS) $< # ============================================================================= # The targets for the double precision binary functions is much less regualar. # ============================================================================= F53_CFLAGS := $(subst -DWINDOWS,-DWNT,$(CFLAGS)) $(F53_OBJS) :: $(F128_DIR)/dpml_private.h $(F128_DIR)/dpml_globals.h \ $(F128_DIR)/dpml_error_codes_enum.h BUILD_FILE_NAME = $(basename $(notdir $@)).h D_F_NAME = -D$(call ToUpper,$(W2)) D_F_TYPE = -D$(call ToUpper,$(W3))_FLOAT $(OBJ_DIR)/dpml_asinh_t.$O :: $(F128_DIR)/sqrt_tab_t.c $(OBJ_DIR)/dpml_erf_t.$O \ $(OBJ_DIR)/dpml_asinh_t.$O :: $(OBJ_DIR)/dpml_%_t.$O : $(F128_DIR)/dpml_%.c \ $(F128_DIR)/dpml_%_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) $(D_F_NAME) \ -DBUILD_FILE_NAME=$(BUILD_FILE_NAME) $< $(OBJ_DIR)/dpml_rt_lgamma_t.$O :: $(F128_DIR)/dpml_lgamma.c \ $(F128_DIR)/dpml_lgamma_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) -DT_FLOAT -DBUILD_FILE_NAME=dpml_lgamma_t.h $< $(OBJ_DIR)/dpml_lgamma_t.$O :: $(F128_DIR)/dpml_lgamma.c \ $(F128_DIR)/dpml_lgamma_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_TYPE) -DDO_LGAMMA -DHACK_GAMMA_INLINE=0 \ -DBUILD_FILE_NAME=dpml_lgamma_t.h $< $(OBJ_DIR)/dpml_acosh_t.$O :: $(F128_DIR)/dpml_asinh.c $(F128_DIR)/dpml_acosh_t.h \ $(F128_DIR)/sqrt_tab_t.c $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) $< $(OBJ_DIR)/dpml_erfc_t.$O :: $(F128_DIR)/dpml_erf.c $(F128_DIR)/dpml_erf_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) \ -DBUILD_FILE_NAME=dpml_erf_t.h $< $(OBJ_DIR)/dpml_log1p_t.$O :: $(F128_DIR)/dpml_log.c $(F128_DIR)/dpml_log_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) $< $(OBJ_DIR)/dpml_log2_t.$O :: $(F128_DIR)/dpml_log.c $(F128_DIR)/dpml_log_t.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) $< $(OBJ_DIR)/dpml_cbrt_t.$O :: $(F128_DIR)/dpml_cbrt.c $(F128_DIR)/dpml_cbrt_t_table.c $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) \ -DBUILD_FILE_NAME=dpml_cbrt_t_table.c $< $(OBJ_DIR)/dpml_exp10_t.$O \ $(OBJ_DIR)/dpml_exp2_t.$O :: $(F128_DIR)/dpml_exp.c $(F128_DIR)/dpml_pow_t_table.c $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) -DUSE_CONTROL87 $< $(OBJ_DIR)/dpml_expm1_t.$O :: $(F128_DIR)/dpml_expm1.c $(F128_DIR)/dpml_pow_t_table.c $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) -DUSE_CONTROL87 $< $(OBJ_DIR)/dpml_tgamma_t.$O :: $(F128_DIR)/dpml_tgamma.c $(F128_DIR)/dpml_special_exp.h $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_NAME) $(D_F_TYPE) $< $(OBJ_DIR)/dpml_special_exp_t.$O :: $(F128_DIR)/dpml_exp.c \ $(F128_DIR)/dpml_pow_t_table.c $(CC) -c $(FO)$@ $(F53_CFLAGS) -DSPECIAL_EXP -DT_FLOAT $< $(OBJ_DIR)/dpml_pow_t_table.$O \ $(OBJ_DIR)/dpml_cbrt_t_table.$O :: $(OBJ_DIR)/%.$O : $(F128_DIR)/%.c $(CC) -c $(FO)$@ $(F53_CFLAGS) $(D_F_TYPE) $< cleanBinary : $(RM) $(F128_OBJS) $(F53_OBJS) # ============================================================================= # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ============================================================================= .PHONY : clean # ============================================================================= # Clean targets # ============================================================================= clean : $(RM) $(BID_LIB) *.$O #realClean : realCleanLib cleanBinary .directory_exists: touch $@ # ============================================================================= # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ============================================================================= # End of makefile # ============================================================================= # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ============================================================================= LIBRARY/makefile.iml_head0000755€­ Q00042560000007174214616534611015351 0ustar aakkasintelall# ############################################################################## # ============================================================================== # Copyright (c) 2007-2024, Intel Corp. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of Intel Corporation nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF # THE POSSIBILITY OF SUCH DAMAGE. # ============================================================================== # ############################################################################## # ============================================================================== ifeq ($(origin MAKEFILE_IML_HEAD),undefined) # Guard against multiple inclusions MAKEFILE_IML_HEAD := already_seen # ============================================================================== # Performance Note: Recursively expanded assignments (i.e. assignments using the # "=" operator) can significantly degrad performance because the right-hand side # of the assignment is evaluated every time the macro is referenced. On the # other hand, for simple assignments (using the ":=" operator) the right-hand # side is only evaluated once, which can improve performance. Conditional # assignments ("?=") are recursively expanded assignments and therefore have # performance implications. We can avoid recursive expansion of conditional # assignments by immediately forcing an evaluation. For example: # # FOOBAR ?= whatever # FOOBAR := $(FOOBAR) # # Alternatively, we can use the 'Cset' macro defined below. 'Cset' is identical # to '?=', but can be used as part of a simple assignment. Using the 'Cset' # operator, the above example looks like: # # FOOBAR := $(call Cset,FOOBAR,whatever) # ============================================================================== __Cset = $(strip $(if $(subst $3,,$(origin $1)),$($1),$2)) Cset = $(call __Cset,$(strip $1),$(strip $2),undefined) # ============================================================================== # Pull in any local definitions # ============================================================================== ifneq ($(origin IML_MAKEFILE_PRE),undefined) include $(IML_MAKEFILE_PRE) endif # ============================================================================== # ############################################################################## # ============================================================================== # Platform independent Macros # ============================================================================== # ############################################################################## # ============================================================================== # ============================================================================== # The following set of macros are used to split the name of a target into fields # and use those field values to create -D command line switches. The macros # assume that the "fields" of a name are separted by an underscore character. # The macro Wn gets the n-th "field" of a file base name and Dn macros prepend # the string _FLAG_ to the field name and get the value associated with that # string. # # Example: Suppose the target name has the form sin_f_la., where f and la # indicate single precision floating point and low accuracy respectively. Then # W1, W2 and W3 would evaluate to sin, f and la repecitively. Further, if the # following symbols were defined: # # _FLAG_sin := -DSIN # _FLAG_f := -DSINGLE # _FLAG_la := -DLOW_ACCURACY # # Then the target/rule: # # $(OBJ_DIR)/sin_f_la.$O: # $(CC) sincos.c $(D1) $(D2) $(D3) # # Would expand to # # $(OBJ_DIR)/sin_f_la.$O: # $(CC) sincos.c -DSIN -DSINGLE -DLOW_ACCURACY # # ============================================================================== BASENAME = $(basename $(notdir $@)) W1 = $(word 1,$(subst _, ,$(BASENAME))) W2 = $(word 2,$(subst _, ,$(BASENAME))) W3 = $(word 3,$(subst _, ,$(BASENAME))) W4 = $(word 4,$(subst _, ,$(BASENAME))) W5 = $(word 5,$(subst _, ,$(BASENAME))) D1 = $(_FLAG_$(W1)) D2 = $(_FLAG_$(W2)) D3 = $(_FLAG_$(W3)) D4 = $(_FLAG_$(W4)) D5 = $(_FLAG_$(W5)) # ============================================================================== # The opposite of breaking a name into it's fields is to construct a name from # it constituent fields. The CrossCat macros produce the cross product of # multiple lists and concatenate the elements of the cross product to produce # a single name. # # For example, the CrossCat of the of the three lists "a b", "_" and "1 2 3" is # the list "a_1 a_2 a_3 b_1 b_2_b3" # ============================================================================== _CrossCat = $(foreach a,$1,$(foreach b,$2,$(strip $a)$(strip $b))) CrossCat = $(if $1,$(if $2,$(call _CrossCat,$1,$2),$1),$2) CrossCat2 = $(call CrossCat,$1,$2) CrossCat3 = $(call CrossCat,$1,$(call CrossCat2,$2,$3)) CrossCat4 = $(call CrossCat,$1,$(call CrossCat3,$2,$3,$4)) CrossCat5 = $(call CrossCat,$1,$(call CrossCat4,$2,$3,$4,$5)) CrossCat6 = $(call CrossCat,$1,$(call CrossCat5,$2,$3,$4,$5,$6)) CrossCat7 = $(call CrossCat,$1,$(call CrossCat6,$2,$3,$4,$5,$6,$7)) CrossCat8 = $(call CrossCat,$1,$(call CrossCat7,$2,$3,$4,$5,$6,$7,$8)) CrossCat9 = $(call CrossCat,$1,$(call CrossCat8,$2,$3,$4,$5,$6,$7,$8,$9)) # ============================================================================== # The GenList macro is a variant of the CrossCat macro that creates lists of # the from: # # __ # # Where fields 1, 2 and 3 are iteratively taken from lists specified as the # 2nd, 3rd or 4th arguments to the macro. If the 3rd or 4th argument is null, it # defaults to $(IML_TYPES) and $(IML_VARIANTS) respectively. If the second # argument is null, then a null list is returned. # # GenPreList and GenObjList are variants of GenList implicitly supply the # directory and file extension for preprocessed and object files respectively # ============================================================================== GenList = $(if $2,$(call CrossCat7,$1,$2, \ _,$(if $3,$3,$(IML_TYPES)), \ _,$(if $4,$4,$(IML_VARIANTS)),$5),) GenList9 = $(if $2,$(call CrossCat9,$1,$2, \ _,$(if $3,$3,$(IML_TYPES)), \ _,$(if $4,$4,$(IML_VARIANTS)),$5,$6,$7),) PrefixSuffix = $(addprefix $(strip $1),$(addsuffix $(strip $3),$(strip $2))) GenTypeList = $(call PrefixSuffix,$1,$(if $2, \ $(call CrossCat3,$2, \ _,$(if $3,$3,$(IML_TYPES))),),$4) GenTypeVarList = $(call PrefixSuffix,$1,$(if $2, \ $(call CrossCat5,$2, \ _,$(if $3,$3,$(IML_TYPES)), \ _,$(if $4,$4,$(IML_VARIANTS))),),$5) # ============================================================================== # GetIndex scans a list looking for a string. If the string is found, the index # of the string in the list (starting from 1) is returned. Otherwise a null is # returned # ============================================================================== __INDICES__ = 1 2 3 4 5 6 7 8 9 10 11 GetIndex = $(strip $(word 1,$(if $(word $(words $(__INDICES__)),$2), \ $(error "List too large. Adjust __INDICES__"), \ $(foreach n,$(__INDICES__), \ $(if $(word $n,$2), \ $(if $(subst $1,,$(word $n,$2)),,$n),\ ))))) IsListItem = $(if $(call GetIndex,$1,$2),true,false) # ============================================================================== # ToUpper and ToLower convert to upper and lower case respectively by invoking # the underlying shell. # ============================================================================== ToUpper = $(shell echo "$1" | tr [:lower:] [:upper:]) ToLower = $(shell echo "$1" | tr [:upper:] [:lower:]) # ============================================================================== # The macro AorBorC(sym, undefVal, target, trueVal, falseVal) a selects one of # three values depending on the state of a named symbol 'sym'. If 'sym' is # undefined, it returns returns 'undefVal'. Otherwise, 'trueVal' or 'falseVal' # is returned depending on whether the symbol value equal 'target' or not. # # The remaining macros in this section select between two values via the AorBorC # macro by making undefVal equal to one of trueVal or falseVal # ============================================================================== AorBorC = $(if $(filter undefined,$(origin $1)),$2,$(if $(filter $3,$($1)),$4,$5)) ZeroOne = $(call AorBorC,$1,0,0,0,1) # 0 if undef or 0, 1 otherwise OneZero = $(call AorBorC,$1,1,0,0,1) # 1 if undef or 1, 0 otherwise DefUndef = $(call AorBorC,$1,-U,1,-D$1,-U$1) # -U if undef or 0, -D otherwise OneSelect = $(call AorBorC,$1,$3,1,$2,$3) # $3 if undef or 0, $2 otherwise # ============================================================================== # CleanList eliminates extra blanks between list elements # ============================================================================== CleanList = $(foreach e,$1,$(strip $e)) # ============================================================================== # GetListName is a "up level" version of the filter command. GetListName # searches a list of lists for a particular element. If the element is found, # it returns a the set of lists that contained the element. Otherwise it returns # null. For example if: # # FF_F_FUNCS := hypot atan2 pow # F_FF_FUNCS := sincos # II_I_FUNCS := div rem # II_II_FUNCS := divrem # CC_C_FUNCS := cpow # # LISTS := FF_F_FUNCS F_FF_FUNCS II_I_FUNCS II_II_FUNCS CC_C_FUNCS # # Then # # $(call GetListName, divrem,$(LISTS)) # # will return II_II_FUNCS # # ============================================================================== GetListName = $(call CleanList,$(foreach n,$2,$(if $(filter $1,$($n)),$n,))) # ============================================================================== # NumCompare allows the comparison of two numbers by escaping to the shell # and using perl to perform an evaluation. It returns one of the strings # less, equal or greater # ============================================================================== NumCompare = $(shell $(PERL) -e \ 'print eval ($1 < $2) ? "less" : ($1 > $2) ? "greater" : "equal"') # ============================================================================== # EchoLongFileList is a partial work around to the shell command line limits on # the echo command. The basic assumption is that the file list has the form: # # //fileName # # The command creates a shorter list by stripping off the and # . parts and then echoing each name in the resulting list with the # and . fields added back it. # # The calling sequence is # # $(call EchoLongFileList,,,) # NOTE: This approach is only effective if is relatively long compared # to /fileName. (Which is currently the case for all IML builds) # # LongFileListToFile is a wrapper around EchoLongFileList that writes the # results to a (new) file - one file per line # ============================================================================== EchoLongFileList = for f in $(patsubst $(strip $1)%$(strip $2),%,$3); \ do echo $(strip $1$)$$f$(strip $2); done LongFileListToFile = rm -f $4; $(call EchoLongFileList,$1,$2,$3) > $4 # ============================================================================== # ############################################################################## # ============================================================================== # Symbol default values # ============================================================================== # ############################################################################## # ============================================================================== # ============================================================================== # Determine host operating system # ============================================================================== Warning = $(warning $1 = $($1)) Error = $(warning $1 = $($1)) OS_ALIAS := Linux FreeBSD Darwin SunOS HP-UX Windows_NT CYGWIN_NT-5.1 CYGWIN_NT-5.2-WOW64 CYGWIN_NT-6.1-WOW64 CYGWIN_NT-6.2-WOW64 OS_MAP_LIST := LINUX FREEBSD MACH LINUX LINUX WINNT WINNT WINNT WINNT WINNT OS_LIST := LINUX FREEBSD MACH WINNT OS_TYPE := LINUX LINUX LINUX WINNT OS_TYPES := LINUX WINNT _HOST_OS := $(shell uname) _HOST_OS_ALIAS_INDEX := $(call GetIndex,$(_HOST_OS),$(OS_ALIAS)) ifeq (,$(_HOST_OS_ALIAS_INDEX)) $(error Unknown host OS $(_HOST_OS)) endif IML_HOST_OS ?= $(word $(_HOST_OS_ALIAS_INDEX),$(OS_MAP_LIST)) IML_HOST_OS := $(IML_HOST_OS) HOST_OS_LIST_INDEX := $(call GetIndex,$(IML_HOST_OS),$(OS_LIST)) ifeq (,$(HOST_OS_LIST_INDEX)) $(error Invalid host OS $(IML_HOST_OS)) endif IML_HOST_OS_TYPE ?= $(word $(HOST_OS_LIST_INDEX),$(OS_TYPE)) IML_HOST_OS_TYPE := $(IML_HOST_OS_TYPE) HOST_OS_TYPE_INDEX := $(call GetIndex,$(IML_HOST_OS_TYPE),$(OS_TYPES)) # ============================================================================== # Determine host architecture. # ============================================================================== ifeq ($(IML_HOST_OS_TYPE),LINUX) ifneq ($(IML_HOST_OS),MACH) _HOST_ARCH := $(shell uname -m) else # ====================================================================== # MACH may report "i386" for uname -m command in both 32 and 64 cases # Therefore we use the following command sequence found in ICS scripts # ====================================================================== __RUN_SYSCTL := $(word 2,\ $(shell sysctl -a hw | grep hw.optional.x86_64:\ 1)) ifeq ($(__RUN_SYSCTL),1) _HOST_ARCH := x86_64 else _HOST_ARCH := x86 endif endif else ifeq ($(IML_HOST_OS_TYPE),WINNT) _HOST_ARCH := $(word 1,$(PROCESSOR_IDENTIFIER)) else $(error Don't know how to determine architecture for $(IML_HOST_OS)) endif endif ARCH_ALIAS := x86 ia64 EM64T x86_64 i686 amd64 Intel64 sun4u ARCH_LIST := IA32 IA64 EFI2 EFI2 IA32 EFI2 EFI2 EFI2 ARCH_TYPE := IA32 IA64 EFI2 EFI2 IA32 EFI2 EFI2 EFI2 ARCH_TYPES := IA32 IA64 EFI2 UARCH_LIST := SSE GSSE LRB LRB2 _HOST_ARCH_INDEX = $(call GetIndex,$(_HOST_ARCH),$(ARCH_ALIAS)) ifeq (,$(_HOST_ARCH_INDEX)) $(error Unknown host architecture $(_HOST_ARCH)) endif IML_HOST_ARCH ?= $(word $(_HOST_ARCH_INDEX),$(ARCH_LIST)) IML_HOST_ARCH := $(IML_HOST_ARCH) HOST_ARCH_LIST_INDEX := $(call GetIndex,$(IML_HOST_ARCH),$(ARCH_LIST)) ifeq (,$(HOST_ARCH_LIST_INDEX)) $(error Invalid host architecture $(IML_HOST_ARCH)) endif IML_HOST_ARCH_TYPE ?= $(word $(HOST_ARCH_LIST_INDEX),$(ARCH_TYPE)) IML_HOST_ARCH_TYPE := $(IML_HOST_ARCH_TYPE) HOST_ARCH_TYPE_INDEX := $(call GetIndex,$(IML_HOST_ARCH_TYPE),$(ARCH_TYPES)) # ============================================================================== # Set up default values of target OS and architecture # ============================================================================== IML_TARGET_OS := $(call Cset,IML_TARGET_OS, $(IML_HOST_OS)) IML_TARGET_ARCH := $(call Cset,IML_TARGET_ARCH, $(IML_HOST_ARCH)) IML_TARGET_UARCH := $(call Cset,IML_TARGET_UARCH,SSE) TARGET_OS_LIST_INDEX := $(call GetIndex,$(IML_TARGET_OS),$(OS_LIST)) ifeq (,$(TARGET_OS_LIST_INDEX)) $(error Invalid target OS $(IML_TARGET_OS)) endif TARGET_ARCH_INDEX := $(call GetIndex,$(IML_TARGET_ARCH),$(ARCH_TYPES)) ifeq (,$(TARGET_ARCH_INDEX)) $(error Invalid target architecture $(IML_TARGET_ARCH)) endif TARGET_UARCH_INDEX := $(call GetIndex,$(IML_TARGET_UARCH),$(UARCH_LIST)) ifeq (,$(TARGET_UARCH_INDEX)) $(error Invalid target micro architecture $(IML_TARGET_UARCH)) endif IML_TARGET_OS_TYPE ?= $(word $(TARGET_OS_LIST_INDEX),$(OS_TYPE)) IML_TARGET_OS_TYPE := $(IML_TARGET_OS_TYPE) TARGET_OS_TYPE_INDEX := $(call GetIndex,$(IML_TARGET_OS_TYPE),$(OS_TYPES)) TARGET_ARCH_TYPE_INDEX := $(call GetIndex,$(IML_TARGET_ARCH),$(ARCH_TYPES)) IML_TARGET_ARCH_TYPE ?= $(word $(TARGET_ARCH_TYPE_INDEX),$(ARCH_TYPES)) IML_TARGET_ARCH_TYPE := $(IML_TARGET_ARCH_TYPE) TARGET_UARCH_TYPE_INDEX := $(TARGET_UARCH_INDEX) # ============================================================================== # Some possibly useful flag macros that can be used with the Dn macros defined # above # ============================================================================== _FLAG_s := -D_SINGLE_ _FLAG_d := -D_DOUBLE_ _FLAG_e := -D_EXTENDED_ _FLAG_lat := -D_LATENCY_ _FLAG_tp := -D_THROUGHPUT_ _FLAG_rf := -D_REDUCED_FUNCTIONALITY_ # ============================================================================== # Miscellaneous macros # ============================================================================== _EMPTY_ := _SP := $(_EMPTY_) $(_EMPTY_) # ============================================================================== # ############################################################################## # ============================================================================== # Platform dependent macros # ============================================================================== # ############################################################################## # ============================================================================== # ============================================================================== # Windows systems can't find the "executable" sometimes if it doesn't begin with # a directory path. The macro ForceExeName check for a directory path and if it # doesn't exist prepends "./" # ============================================================================== ForceExeName = $(if $(subst $1,,$(notdir $1)),$1,./$1) # ============================================================================== # Define macros to choose values depending on the OS and architecture setting. # ============================================================================== HostOsSelect = $(strip $($(HOST_OS_LIST_INDEX))) HostOsTypeSelect = $(strip $($(HOST_OS_TYPE_INDEX))) HostArchSelect = $(strip $($(HOST_ARCH_LIST_INDEX))) HostArchTypeSelect = $(strip $($(HOST_ARCH_TYPE_INDEX))) TargetOsSelect = $(strip $($(TARGET_OS_LIST_INDEX))) TargetOsTypeSelect = $(strip $($(TARGET_OS_TYPE_INDEX))) TargetArchSelect = $(strip $($(TARGET_ARCH_INDEX))) TargetArchTypeSelect = $(strip $($(TARGET_ARCH_TYPE_INDEX))) TargetUarchTypeSelect = $(strip $($(TARGET_UARCH_TYPE_INDEX))) # ============================================================================== # Define standard OS and ARCH dependent symbols # # CselOs is a combination of Cset and a named 'OsTypeSelect' routine. The value # of OS_CHOICE allows selection based on either the host or target OS. The # default value of OS_CHOICE is 'Target' # # CselArch is a combination of Cset and a named 'ArchTypeSelect' routine. # The value of ARCH_CHOICE allows selection based on either the host or target # ARCH. The default value of ARCH_CHOICE is OS_CHOICE # # CselOsName is a combination of Cset and a named 'OsSelect' routine. # of OS_CHOICE allows selection based on either the host or target OS. # This one is used for distinguishing MacOS (MACH) from other LINUX-type OSes # ============================================================================== CselOs = $(call Cset,$1,$(call $(strip $2OsTypeSelect),$3,$4)) CselArch = $(call Cset,$1,$(call $(strip $2ArchTypeSelect),$3,$4,$5)) CselOsName = $(call Cset,$1,$(call $(strip $2OsSelect),$3,$4,$5,$6)) O := $(call CselOs, O, Target, o, obj) A := $(call CselOs, A, Target, a, lib) SHR := $(call CselOsName, SHR, Target, so, so, dylib, dll) EXE := $(call CselOs, EXE, Target, , .exe) IEXT := $(call CselOs, IEXT, Target, il, iw) ASMEXT := $(call CselOs, ASMEXT, Target, s, asm) RECOGN := $(call CselOs, RECOGN, Target,-xc, -Tc) OS_CHOICE ?= TARGET _OS_CHOICE := $(if $(subst host,,$(call ToLower,$(OS_CHOICE))),Target,Host) ARCH_CHOICE ?= OS_CHOICE _ARCH_CHOICE := $(if $(subst host,,$(call ToLower,$(ARCH_CHOICE))),Target,Host) __empty:= MACH__space:=$(__empty) $(__empty) FLS := $(call CselOs,FLS, $(_OS_CHOICE), :, ;) RM := $(call CselOs, RM, $(_OS_CHOICE), rm -f, del ) AR_CMD := $(call CselOsName,AR_CMD,$(_OS_CHOICE), ar rv, ar rv, libtool, lib -nologo) AR_OUT := $(call CselOsName,AR_OUT,$(_OS_CHOICE), , , -o, /out:)$($(IML_TARGET_OS)__space) LD_CMD := $(call CselOsName,LD_CMD,$(_OS_CHOICE),icc,icc,libtool,link /nologo) LD_OUT := $(call CselOs, LD_OUT, $(_OS_CHOICE), -o, /out:)$($(IML_TARGET_OS)__space) LD_FLAGS := $(call CselOs, LD_FLAGS, $(_OS_CHOICE), -shared -nostdlib,) RC := $(call CselOs, RC, $(_OS_CHOICE), RC_not_to_be_used_with_linux,rc) RC_FLAGS := $(call CselOs, RC_FLAGS, $(_OS_CHOICE), RC_FLAGS_not_to_be_used_with_linux,) RC_OUT := $(call CselOs, RC_OUT, $(_OS_CHOICE), RC_OUT_not_to_be_used_with_linux,-Fo) PERL := $(call Cset, PERL, perl) # ============================================================================== # If the user hasn't set up the CC value, use an internal default value. In # either case verify the compiler choice; set up standard compiler switches; # and determine if this is an Intel compiler. # # CselCc is similar to CselOs except that selection is made based on the # compiler type rather than the OS type. # # QoptOpt return "Q options" for turning remark messages off. I.e. QoptOpt(a,b) # returns either "-a b" or "-Qa:b" # ============================================================================== ifeq ($(origin CC_NAME),undefined) ifeq ($(origin CC),default) CC_NAME := $(call $(_OS_CHOICE)OsTypeSelect, icc, icl) else __TMP := $(strip $(subst /, ,$(firstword $(CC)))) CC_NAME := $(word $(words $(__TMP)), $(__TMP)) endif endif CC_NAME_LIST := icx icc icl gcc cl cc clang CC_TYPE_LIST := gcc gcc cl gcc cl gcc gcc CC_TYPES := gcc cl INTEL_CC_LIST := icc icl icx CC_NAME_INDEX := $(call GetIndex,$(CC_NAME),$(CC_NAME_LIST)) ifeq ($(CC_NAME_INDEX),) $(error "Unknown CC_NAME ($(CC_NAME)). Must be one of $(CC_NAME_LIST)) endif #$(error "CC_NAME_INDEX ($(CC_NAME_INDEX)). Ahmet)) CC_INDEX := $(call GetIndex,$(CC_NAME),$(CC_NAME_LIST)) CC_TYPE := $(word $(CC_INDEX),$(CC_TYPE_LIST)) CC_TYPE_INDEX := $(call GetIndex,$(CC_TYPE),$(CC_TYPES)) #$(error "CC_TYPE_INDEX ($(CC_TYPE_INDEX)). Ahmet)) #$(error "CC_INDEX ($(CC_INDEX)). Ahmet)) CcSelect = $(strip $($(CC_INDEX))) CcTypeSelect = $(strip $($(CC_TYPE_INDEX))) CcNameSelect = $(strip $($(CC_NAME_INDEX))) _CPP := $(CC_NAME) $(call CcNameSelect,-EP,-EP,-E -P,-EP) _CC := $(CC_NAME) $(call CcNameSelect,, -nologo,, -nologo) ifeq ($(origin CC),default) CC := $(_CC) endif ifeq ($(origin CPP),default) CPP := $(_CPP) endif CselCc = $(call Cset,$1,$(call CcTypeSelect,$2,$3)) QOPT := $(call CselCc, QOPT, -, /Q ) FO := $(call CselCc, FO, -o$(_SP), /Fo ) FE := $(call CselCc, FE, -o$(_SP), /Fe ) FA := $(call CselCc, FA, -o$(_SP), /Fa ) DBG := $(call CselCc, DBG, -O0 -g, /Od /Zi ) CC_LDFLAGS := $(call CcNameSelect,,,-lm, bufferoverflowU.lib ) QoptOpt = $(call CcTypeSelect,-$1 $2,-Q$1:$2) _NOVECMSG := $(call Cset, _NOVECMSG, $(call QoptOpt,diag-disable,vec)) IS_INTEL_CC = $(call IsListItem,$(CC_NAME),$(INTEL_CC_LIST)) ifeq ($(IS_INTEL_CC),true) ifneq ($(IML_HOST_ARCH_TYPE),IA64) NOVECMSG ?= $(_NOVECMSG) endif endif # ============================================================================== # RmLeadingDotSlash removes the leading ./ charaters at the beginning of # relative path names. This can be useful when trying to pattern match files # names in automatic variables (like $?, $^, $<, etc.) because make removes the # leading ./ charachters from the file names in those variables. # ============================================================================== ifeq ($(IML_HOST_OS),MACH) RmLeadingDotSlash = $(shell echo "$(strip $1)" | sed -E -e "s%^(\./+)+%%g") else RmLeadingDotSlash = $(shell echo "$(strip $1)" | sed -e "s%^\\(\\./\\+\\)\\+%%g") endif # ============================================================================== # Determine number of CPU's on this processor # ============================================================================== ifeq ($(IML_HOST_OS_TYPE),LINUX) ifeq ($(IML_HOST_OS),MACH) _NUM_CPUS := $(shell /usr/sbin/system_profiler SPHardwareDataType | \ egrep Processors | sed -e "s/^.* //") else _NUM_CPUS := $(shell egrep processor /proc/cpuinfo | wc -l) endif else ifeq ($(IML_HOST_OS_TYPE),WINNT) _HUM_CPUS := $(NUMBER_OF_PROCESSORS) else $(error Don't know how to determine architecture for $(IML_HOST_OS)) endif endif IML_NUM_CPUS ?= $(_NUM_CPUS) # ============================================================================== # Directory structure macros. # # NOTE: These are included at the end of the file so that any definitions # supplied by IML_MAKEFILE_PRE can use the use previously defined symbols to # create their values. # ============================================================================== OBJ_DIR := $(call Cset,OBJ_DIR, ./obj ) SRC_DIR := $(call Cset,SRC_DIR, . ) INC_DIR := $(call Cset,INC_DIR, . ) PRE_DIR := $(call Cset,PRE_DIR, ./pre ) GEN_DIR := $(call Cset,GEN_DIR, $(PRE_DIR) ) EXE_DIR := $(call Cset,EXE_DIR, $(OBJ_DIR) ) WRK_DIR := $(call Cset,WRK_DIR, ./wrk ) RES_DIR := $(call Cset,RES_DIR, $(WRK_DIR) ) LIB_DIR := $(call Cset,LIB_DIR, $(OBJ_DIR) ) DEST_DIR := $(call Cset,DEST_DIR, $(LIB_DIR) ) LIBM_DIR := $(call Cset,LIBM_DIR, $(OBJ_DIR) ) TSRC_DIR := $(call Cset,TSRC_DIR, . ) TOBJ_DIR := $(call Cset,TOBJ_DIR, $(OBJ_DIR) ) IML_COMMON_DIR := $(call Cset,IML_COMMON_DIR,$(LIBDEV)/mathlibs/common) IML_TOOLS_DIR := $(call Cset,IML_TOOLS_DIR,$(IML_COMMON_DIR)) DIR_EXISTS := $(call Cset,DIR_EXISTS,.directory_exists) OBJ_DIR_EXISTS := $(call Cset,OBJ_DIR_EXISTS, $(OBJ_DIR)/$(DIR_EXISTS) ) PRE_DIR_EXISTS := $(call Cset,PRE_DIR_EXISTS, $(PRE_DIR)/$(DIR_EXISTS) ) EXE_DIR_EXISTS := $(call Cset,EXE_DIR_EXISTS, $(EXE_DIR)/$(DIR_EXISTS) ) WRK_DIR_EXISTS := $(call Cset,WRK_DIR_EXISTS, $(WRK_DIR)/$(DIR_EXISTS) ) RES_DIR_EXISTS := $(call Cset,RES_DIR_EXISTS, $(RES_DIR)/$(DIR_EXISTS) ) LIB_DIR_EXISTS := $(call Cset,LIB_DIR_EXISTS, $(LIB_DIR)/$(DIR_EXISTS) ) DEST_DIR_EXISTS := $(call Cset,DEST_DIR_EXISTS, $(DEST_DIR)/$(DIR_EXISTS) ) LIBM_DIR_EXISTS := $(call Cset,LIBM_DIR_EXISTS, $(LIBM_DIR)/$(DIR_EXISTS) ) TOBJ_DIR_EXISTS := $(call Cset,TOBJ_DIR_EXISTS, $(TOBJ_DIR)/$(DIR_EXISTS) ) MKLIB := $(PERL) $(IML_TOOLS_DIR)/mklib.pl # ============================================================================== # Export interesting symbols # ============================================================================== ifndef NO_PERL5LIB_EXPORT export PERL5LIB := $(IML_TOOLS_DIR)$(FLS)$(PERL5LIB) endif endif # ifeq ($(origin MAKEFILE_IML_HEAD),undefined) # ============================================================================== # ############################################################################## # ============================================================================== # End makefile.iml_head # ============================================================================== # ############################################################################## # ============================================================================== LIBRARY/RUNWINDOWSINTEL64_ICL.bat0000755€­ Q00042560000000017214616534611015741 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN WINDOWS..." del *.lib call windowsbuild_icl.bat echo "END BUILDING LIBRARY IN WINDOWS..." LIBRARY/RUNWINDOWSINTEL64_ICX.bat0000755€­ Q00042560000000017214616534611015755 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN WINDOWS..." del *.lib call windowsbuild_icx.bat echo "END BUILDING LIBRARY IN WINDOWS..." LIBRARY/RUNLINUXINTEL64_CLANG0000755€­ Q00042560000000015514616534611015157 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN LINUX..." rm *.a ./linuxbuild_clang echo "END BUILDING LIBRARY IN LINUX..." LIBRARY/RUNLINUXINTEL64_ICC0000755€­ Q00042560000000015314616534611014727 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN LINUX..." rm *.a ./linuxbuild_icc echo "END BUILDING LIBRARY IN LINUX..." LIBRARY/RUNLINUXINTEL64_ICX0000755€­ Q00042560000000015314616534611014754 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN LINUX..." rm *.a ./linuxbuild_icx echo "END BUILDING LIBRARY IN LINUX..." LIBRARY/RUNWINDOWS_nmake.bat0000755€­ Q00042560000000022714616534611015460 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN WINDOWS..." del *.lib call windowsbuild_nmake.bat -fmakefile.mak echo "END BUILDING LIBRARY IN WINDOWS..." LIBRARY/linuxbuild_clang0000755€­ Q00042560000000375014616534611015350 0ustar aakkasintelallrm -f *.o *.a make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang000libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang001libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang010libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang011libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang100libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang101libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang110libbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a clang111libbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang000blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang001blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang010blibbid.a make clean make CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang011blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang100blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang101blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang110blibbid.a make clean make CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a clang111blibbid.a make clean LIBRARY/linuxbuild_icc0000755€­ Q00042560000000365114616534611015022 0ustar aakkasintelallrm -f *.o *.a make CC=icc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc000libbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc001libbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc010libbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc011libbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc100libbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc101libbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc110libbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icc111libbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc000blibbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc001blibbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc010blibbid.a make clean make CC=icc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc011blibbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc100blibbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc101blibbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc110blibbid.a make clean make CC=icc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icc111blibbid.a make clean LIBRARY/linuxbuild_gcc0000755€­ Q00042560000000365014616534611015017 0ustar aakkasintelallrm -f *.o *.a make CC=gcc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc000libbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc001libbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc010libbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc011libbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc100libbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc101libbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc110libbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a gcc111libbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc000blibbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc001blibbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc010blibbid.a make clean make CC=gcc CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc011blibbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc100blibbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc101blibbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc110blibbid.a make clean make CC=gcc CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a gcc111blibbid.a make clean LIBRARY/linuxbuild_icx0000755€­ Q00042560000000365114616534611015047 0ustar aakkasintelallrm -f *.o *.a make CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx000libbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx001libbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx010libbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx011libbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx100libbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx101libbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx110libbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 mv libbid.a icx111libbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx000blibbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx001blibbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx010blibbid.a make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx011blibbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx100blibbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx101blibbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx110blibbid.a make clean make CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 mv libbid.a icx111blibbid.a make clean LIBRARY/RUNWINDOWSINTEL64_CL.bat0000755€­ Q00042560000000017114616534611015627 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN WINDOWS..." del *.lib call windowsbuild_cl.bat echo "END BUILDING LIBRARY IN WINDOWS..." LIBRARY/windowsbuild_icl.bat0000755€­ Q00042560000000443214616534611016131 0ustar aakkasintelalldel *.obj *.lib make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl000libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl001libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl010libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl011libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl100libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl101libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl110libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icl111libbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl000blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl001blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl010blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl011blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl100blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl101blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl110blibbid.lib make clean make _HOST_OS=Windows_NT CC=icl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icl111blibbid.lib make clean LIBRARY/windowsbuild_icx.bat0000755€­ Q00042560000000443214616534611016145 0ustar aakkasintelalldel *.obj *.lib make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx000libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx001libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx010libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx011libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx100libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx101libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx110libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib icx111libbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx000blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx001blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx010blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx011blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx100blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx101blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx110blibbid.lib make clean make _HOST_OS=Windows_NT CC=icx CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib icx111blibbid.lib make clean LIBRARY/windowsbuild_nmake.bat0000755€­ Q00042560000000437714616534611016465 0ustar aakkasintelalldel *.obj *.lib nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl000libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl001libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl010libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl011libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl100libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl101libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl110libbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl111libbid.lib nmake %1 clean nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl000blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl001blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl010blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl011blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl100blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl101blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl110blibbid.lib nmake %1 clean nmake %1 CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl111blibbid.lib nmake %1 clean LIBRARY/makefile.mak0000755€­ Q00042560000002732714616534611014357 0ustar aakkasintelall# ############################################################################## # ============================================================================== # Copyright (c) 2007-2024, Intel Corp. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of Intel Corporation nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF # THE POSSIBILITY OF SUCH DAMAGE. # ============================================================================== # ############################################################################## # ============================================================================== # Makefile for math functions for the Intel(r) # Decimal Floating-Point Math Library AR=lib AOPT=-nologo _USE_NATIVE_128b=TRUE !IF ("$(CC)" == "icl") CFLAGS=-nologo -DUSE_COMPILER_F128_TYPE=1 -DUSE_COMPILER_F80_TYPE=1 -DWINDOWS\ -od /Qlong-double /Qpc80 /Qstd=c99\ -Qoption,cpp,--extended_float_types\ -UBID_BIG_ENDIAN !ELSE _USE_NATIVE_128b=FALSE CFLAGS=-nologo /DUSE_COMPILER_F128_TYPE=0 /DUSE_COMPILER_F80_TYPE=0 /DWINDOWS -UBID_BIG_ENDIAN\ -UBID_BIG_ENDIAN !ENDIF !IFDEF DBG !IF ($(DBG)==1) DEBUG=/Od /Zi !ELSE DEBUG= !ENDIF !ELSE DEBUG= !ENDIF CFLAG_F128_CONF=-DUSE_NATIVE_QUAD_TYPE=0 -Dia32 -Dwnt -UBID_BIG_ENDIAN CFLAG_F53_CONF=-DT_FLOAT -Dia32 -Dwnt -UBID_BIG_ENDIAN !IFDEF CALL_BY_REF !IF ($(CALL_BY_REF)==1) COPT_REF=-DDECIMAL_CALL_BY_REFERENCE=1 !ELSE COPT_REF=-DDECIMAL_CALL_BY_REFERENCE=0 !ENDIF !ENDIF !IFDEF GLOBAL_RND !IF ($(GLOBAL_RND)==1) COPT_RND=-DDECIMAL_GLOBAL_ROUNDING=1 !ELSE COPT_RND=-DDECIMAL_GLOBAL_ROUNDING=0 !ENDIF !ENDIF !IFDEF GLOBAL_FLAGS !IF ($(GLOBAL_FLAGS)==1) COPT_GLOBAL=-DDECIMAL_GLOBAL_EXCEPTION_FLAGS=1 !ELSE COPT_GLOBAL=-DDECIMAL_GLOBAL_EXCEPTION_FLAGS=0 !ENDIF !ENDIF !IFDEF UNCHANGED_BINARY_FLAGS !IF ($(UNCHANGED_BINARY_FLAGS)==1) COPT_UNCHANGED_BINARY=-DUNCHANGED_BINARY_STATUS_FLAGS !ENDIF !ENDIF O=. S=.\src F=.\float128 OBJ=obj AFLAG=-nologo BID_LIB=libbid.lib BID_OBJS = \ $O\bid128.$(OBJ) $O\bid128_2_str_tables.$(OBJ) $O\bid128_acos.$(OBJ) $O\bid128_acosh.$(OBJ) $O\bid128_add.$(OBJ) $O\bid128_asin.$(OBJ)\ $O\bid128_asinh.$(OBJ) $O\bid128_atan.$(OBJ) $O\bid128_atan2.$(OBJ) $O\bid128_atanh.$(OBJ) $O\bid128_cbrt.$(OBJ) $O\bid128_compare.$(OBJ)\ $O\bid128_cos.$(OBJ) $O\bid128_cosh.$(OBJ) $O\bid128_div.$(OBJ) $O\bid128_erf.$(OBJ) $O\bid128_erfc.$(OBJ) $O\bid128_exp.$(OBJ)\ $O\bid128_exp10.$(OBJ) $O\bid128_exp2.$(OBJ) $O\bid128_expm1.$(OBJ) $O\bid128_fdimd.$(OBJ) $O\bid128_fma.$(OBJ) $O\bid128_fmod.$(OBJ)\ $O\bid128_frexp.$(OBJ) $O\bid128_hypot.$(OBJ) $O\bid128_ldexp.$(OBJ) $O\bid128_lgamma.$(OBJ) $O\bid128_llrintd.$(OBJ) $O\bid128_log.$(OBJ)\ $O\bid128_log10.$(OBJ) $O\bid128_log1p.$(OBJ) $O\bid128_log2.$(OBJ) $O\bid128_logb.$(OBJ) $O\bid128_logbd.$(OBJ) $O\bid128_lrintd.$(OBJ)\ $O\bid128_lround.$(OBJ) $O\bid128_minmax.$(OBJ) $O\bid128_modf.$(OBJ) $O\bid128_mul.$(OBJ) $O\bid128_nearbyintd.$(OBJ) $O\bid128_next.$(OBJ)\ $O\bid128_nexttowardd.$(OBJ) $O\bid128_noncomp.$(OBJ) $O\bid128_pow.$(OBJ) $O\bid128_quantexpd.$(OBJ) $O\bid128_quantize.$(OBJ) $O\bid128_rem.$(OBJ)\ $O\bid128_round_integral.$(OBJ) $O\bid128_scalb.$(OBJ) $O\bid128_scalbl.$(OBJ) $O\bid128_sin.$(OBJ) $O\bid128_sinh.$(OBJ) $O\bid128_sqrt.$(OBJ)\ $O\bid128_string.$(OBJ) $O\bid128_tan.$(OBJ) $O\bid128_tanh.$(OBJ) $O\bid128_tgamma.$(OBJ) $O\bid128_to_int16.$(OBJ) $O\bid128_to_int32.$(OBJ)\ $O\bid128_to_int64.$(OBJ) $O\bid128_to_int8.$(OBJ) $O\bid128_to_uint16.$(OBJ) $O\bid128_to_uint32.$(OBJ) $O\bid128_to_uint64.$(OBJ) $O\bid128_to_uint8.$(OBJ)\ $O\bid32_acos.$(OBJ) $O\bid32_acosh.$(OBJ) $O\bid32_add.$(OBJ) $O\bid32_asin.$(OBJ) $O\bid32_asinh.$(OBJ) $O\bid32_atan.$(OBJ)\ $O\bid32_atan2.$(OBJ) $O\bid32_atanh.$(OBJ) $O\bid32_cbrt.$(OBJ) $O\bid32_compare.$(OBJ) $O\bid32_cos.$(OBJ) $O\bid32_cosh.$(OBJ)\ $O\bid32_div.$(OBJ) $O\bid32_erf.$(OBJ) $O\bid32_erfc.$(OBJ) $O\bid32_exp.$(OBJ) $O\bid32_exp10.$(OBJ) $O\bid32_exp2.$(OBJ)\ $O\bid32_expm1.$(OBJ) $O\bid32_fdimd.$(OBJ) $O\bid32_fma.$(OBJ) $O\bid32_fmod.$(OBJ) $O\bid32_frexp.$(OBJ) $O\bid32_hypot.$(OBJ)\ $O\bid32_ldexp.$(OBJ) $O\bid32_lgamma.$(OBJ) $O\bid32_llrintd.$(OBJ) $O\bid32_log.$(OBJ) $O\bid32_log10.$(OBJ) $O\bid32_log1p.$(OBJ)\ $O\bid32_log2.$(OBJ) $O\bid32_logb.$(OBJ) $O\bid32_logbd.$(OBJ) $O\bid32_lrintd.$(OBJ) $O\bid32_lround.$(OBJ) $O\bid32_minmax.$(OBJ)\ $O\bid32_modf.$(OBJ) $O\bid32_mul.$(OBJ) $O\bid32_nearbyintd.$(OBJ) $O\bid32_next.$(OBJ) $O\bid32_nexttowardd.$(OBJ) $O\bid32_noncomp.$(OBJ)\ $O\bid32_pow.$(OBJ) $O\bid32_quantexpd.$(OBJ) $O\bid32_quantize.$(OBJ) $O\bid32_rem.$(OBJ) $O\bid32_round_integral.$(OBJ) $O\bid32_scalb.$(OBJ)\ $O\bid32_scalbl.$(OBJ) $O\bid32_sin.$(OBJ) $O\bid32_sinh.$(OBJ) $O\bid32_sqrt.$(OBJ) $O\bid32_string.$(OBJ) $O\bid32_sub.$(OBJ)\ $O\bid32_tan.$(OBJ) $O\bid32_tanh.$(OBJ) $O\bid32_tgamma.$(OBJ) $O\bid32_to_bid128.$(OBJ) $O\bid32_to_bid64.$(OBJ) $O\bid32_to_int16.$(OBJ)\ $O\bid32_to_int32.$(OBJ) $O\bid32_to_int64.$(OBJ) $O\bid32_to_int8.$(OBJ) $O\bid32_to_uint16.$(OBJ) $O\bid32_to_uint32.$(OBJ) $O\bid32_to_uint64.$(OBJ)\ $O\bid32_to_uint8.$(OBJ) $O\bid64_acos.$(OBJ) $O\bid64_acosh.$(OBJ) $O\bid64_add.$(OBJ) $O\bid64_asin.$(OBJ) $O\bid64_asinh.$(OBJ)\ $O\bid64_atan.$(OBJ) $O\bid64_atan2.$(OBJ) $O\bid64_atanh.$(OBJ) $O\bid64_cbrt.$(OBJ) $O\bid64_compare.$(OBJ) $O\bid64_cos.$(OBJ)\ $O\bid64_cosh.$(OBJ) $O\bid64_div.$(OBJ) $O\bid64_erf.$(OBJ) $O\bid64_erfc.$(OBJ) $O\bid64_exp.$(OBJ) $O\bid64_exp10.$(OBJ)\ $O\bid64_exp2.$(OBJ) $O\bid64_expm1.$(OBJ) $O\bid64_fdimd.$(OBJ) $O\bid64_fma.$(OBJ) $O\bid64_fmod.$(OBJ) $O\bid64_frexp.$(OBJ)\ $O\bid64_hypot.$(OBJ) $O\bid64_ldexp.$(OBJ) $O\bid64_lgamma.$(OBJ) $O\bid64_llrintd.$(OBJ) $O\bid64_log.$(OBJ) $O\bid64_log10.$(OBJ)\ $O\bid64_log1p.$(OBJ) $O\bid64_log2.$(OBJ) $O\bid64_logb.$(OBJ) $O\bid64_logbd.$(OBJ) $O\bid64_lrintd.$(OBJ) $O\bid64_lround.$(OBJ)\ $O\bid64_minmax.$(OBJ) $O\bid64_modf.$(OBJ) $O\bid64_mul.$(OBJ) $O\bid64_nearbyintd.$(OBJ) $O\bid64_next.$(OBJ) $O\bid64_nexttowardd.$(OBJ)\ $O\bid64_noncomp.$(OBJ) $O\bid64_pow.$(OBJ) $O\bid64_quantexpd.$(OBJ) $O\bid64_quantize.$(OBJ) $O\bid64_rem.$(OBJ) $O\bid64_round_integral.$(OBJ)\ $O\bid64_scalb.$(OBJ) $O\bid64_scalbl.$(OBJ) $O\bid64_sin.$(OBJ) $O\bid64_sinh.$(OBJ) $O\bid64_sqrt.$(OBJ) $O\bid64_string.$(OBJ)\ $O\bid64_tan.$(OBJ) $O\bid64_tanh.$(OBJ) $O\bid64_tgamma.$(OBJ) $O\bid64_to_bid128.$(OBJ) $O\bid64_to_int16.$(OBJ) $O\bid64_to_int32.$(OBJ)\ $O\bid64_to_int64.$(OBJ) $O\bid64_to_int8.$(OBJ) $O\bid64_to_uint16.$(OBJ) $O\bid64_to_uint32.$(OBJ) $O\bid64_to_uint64.$(OBJ) $O\bid64_to_uint8.$(OBJ)\ $O\bid_binarydecimal.$(OBJ) $O\bid_convert_data.$(OBJ) $O\bid_decimal_data.$(OBJ) $O\bid_decimal_globals.$(OBJ) $O\bid_dpd.$(OBJ) $O\bid_feclearexcept.$(OBJ)\ $O\bid_fegetexceptflag.$(OBJ) $O\bid_feraiseexcept.$(OBJ) $O\bid_fesetexceptflag.$(OBJ) $O\bid_fetestexcept.$(OBJ) $O\bid_flag_operations.$(OBJ) $O\bid_from_int.$(OBJ)\ $O\bid_round.$(OBJ) $O\strtod128.$(OBJ) $O\strtod32.$(OBJ) $O\strtod64.$(OBJ) $O\wcstod128.$(OBJ) $O\wcstod32.$(OBJ) $O\wcstod64.$(OBJ) \ $O\bid32_llround.$(OBJ) $O\bid64_llround.$(OBJ) $O\bid128_llround.$(OBJ) \ $O\bid32_llquantexpd.$(OBJ) $O\bid64_llquantexpd.$(OBJ) $O\bid128_llquantexpd.$(OBJ) \ $O\bid32_quantumd.$(OBJ) $O\bid64_quantumd.$(OBJ) $O\bid128_quantumd.$(OBJ) FLOAT128_OBJS = \ $O\dpml_ux_bid.$(OBJ) $O\dpml_ux_bessel.$(OBJ) $O\dpml_ux_cbrt.$(OBJ) $O\dpml_ux_erf.$(OBJ) $O\dpml_ux_exp.$(OBJ) $O\dpml_ux_int.$(OBJ)\ $O\dpml_ux_inv_hyper.$(OBJ) $O\dpml_ux_inv_trig.$(OBJ) $O\dpml_ux_lgamma.$(OBJ) $O\dpml_ux_log.$(OBJ) $O\dpml_ux_mod.$(OBJ)\ $O\dpml_ux_powi.$(OBJ) $O\dpml_ux_pow.$(OBJ) $O\dpml_ux_sqrt.$(OBJ) $O\dpml_ux_trig.$(OBJ) $O\dpml_ux_ops.$(OBJ) $O\dpml_ux_ops_64.$(OBJ)\ $O\dpml_four_over_pi.$(OBJ) $O\dpml_exception.$(OBJ) $O\sqrt_tab_t.$(OBJ) FLOAT53_OBJS = \ $O\dpml_asinh_t.$(OBJ) $O\dpml_acosh_t.$(OBJ) $O\dpml_cbrt_t.$(OBJ) $O\dpml_erf_t.$(OBJ) $O\dpml_erfc_t.$(OBJ) $O\dpml_expm1_t.$(OBJ)\ $O\dpml_exp10_t.$(OBJ) $O\dpml_exp2_t.$(OBJ) $O\dpml_lgamma_t.$(OBJ) $O\dpml_log1p_t.$(OBJ) $O\dpml_log2_t.$(OBJ) $O\dpml_tgamma_t.$(OBJ)\ $O\dpml_rt_lgamma_t.$(OBJ) $O\dpml_pow_t_table.$(OBJ) $O\dpml_cbrt_t_table.$(OBJ) $O\dpml_special_exp_t.$(OBJ) ALL: $(BID_LIB) .SUFFIXES: .SUFFIXES: .obj .c BID_OBJECTS: $(BID_OBJS) {$S}.c{$O}.$(OBJ) : $(CC) -I\$S -c -Fd$O\ $(CFLAGS) $(COPT_REF) $(COPT_RND) $(COPT_GLOBAL) $(COPT_UNCHANGED_BINARY) $(DEBUG) $< $(BID_OBJS) : FLOAT128: $(FLOAT128_OBJS) {$F}.c{$O}.$(OBJ) :: $(CC) -I\$F -c -Fd$O\ $(CFLAG_F128_CONF) $(DEBUG) $< $(FLOAT128_OBJS) : FLOAT53: $(FLOAT53_OBJS) $O\dpml_asinh_t.$(OBJ) : $F\dpml_asinh.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DASINH $(DEBUG) $** $O\dpml_acosh_t.$(OBJ) : $F\dpml_asinh.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DACOSH $(DEBUG) $** $O\dpml_cbrt_t.$(OBJ) : $F\dpml_cbrt.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DCBRT -DBUILD_FILE_NAME=$(**B)_t_table.c $(DEBUG) $** $O\dpml_erf_t.$(OBJ) : $F\dpml_erf.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DERF -DBUILD_FILE_NAME=$(**B)_t.h $(DEBUG) $** $O\dpml_erfc_t.$(OBJ) : $F\dpml_erf.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DERFC -DBUILD_FILE_NAME=$(**B)_t.h $(DEBUG) $** $O\dpml_expm1_t.$(OBJ) : $F\dpml_expm1.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DEXPM1 -DUSE_CONTROL87 $(DEBUG) $** $O\dpml_exp10_t.$(OBJ) : $F\dpml_exp.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DEXP10 -DUSE_CONTROL87 $(DEBUG) $** $O\dpml_exp2_t.$(OBJ) : $F\dpml_exp.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DEXP2 -DUSE_CONTROL87 $(DEBUG) $** $O\dpml_lgamma_t.$(OBJ) : $F\dpml_lgamma.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DDO_LGAMMA -DHACK_GAMMA_INLINE=0 -DBUILD_FILE_NAME=$(**B)_t.h $(DEBUG) $** $O\dpml_log2_t.$(OBJ) : $F\dpml_log.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DLOG2 -DBASE_OF_LOG=1 $(DEBUG) $** $O\dpml_log1p_t.$(OBJ) : $F\dpml_log.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DLOG1P $(DEBUG) $** $O\dpml_tgamma_t.$(OBJ) : $F\dpml_tgamma.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DTGAMMA $(DEBUG) $** $O\dpml_rt_lgamma_t.$(OBJ) : $F\dpml_lgamma.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DBUILD_FILE_NAME=$(**B)_t.h $(DEBUG) $** $O\dpml_pow_t_table.$(OBJ) : $F\dpml_pow_t_table.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) $(DEBUG) $** $O\dpml_cbrt_t_table.$(OBJ) : $F\dpml_cbrt_t_table.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) $(DEBUG) $** $O\dpml_special_exp_t.$(OBJ) : $F\dpml_exp.c $(CC) -c /Fo$@ $(CFLAG_F53_CONF) -DSPECIAL_EXP $(DEBUG) $** !IF ("$(_USE_NATIVE_128b)"=="FALSE") $(BID_LIB): $(BID_OBJS) $(FLOAT128_OBJS) $(FLOAT53_OBJS) $(AR) $(AOPT) /out:$(BID_LIB) $(BID_OBJS) $(FLOAT128_OBJS) $(FLOAT53_OBJS) !ELSE #Use native 128b data types $(BID_LIB): $(BID_OBJS) $(AR) $(AOPT) /out:$(BID_LIB) $(BID_OBJS) !ENDIF clean : del *.$(OBJ) libbid.lib LIBRARY/windowsbuild_cl.bat0000755€­ Q00042560000000437214616534611015763 0ustar aakkasintelalldel *.obj *.lib make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl000libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl001libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl010libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl011libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl100libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl101libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl110libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib cl111libbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl000blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl001blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl010blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl011blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl100blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl101blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl110blibbid.lib make clean make _HOST_OS=Windows_NT CC=cl CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib cl111blibbid.lib make clean LIBRARY/windowsbuild_clang.bat0000755€­ Q00042560000000453214616534611016447 0ustar aakkasintelalldel *.obj *.lib make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang000libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang001libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang010libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang011libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang100libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang101libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang110libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=0 ren libbid.lib clang111libbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang000blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang001blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang010blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=0 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang011blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang100blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=0 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang101blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang110blibbid.lib make clean make _HOST_OS=Windows_NT CC=clang CALL_BY_REF=1 GLOBAL_RND=1 GLOBAL_FLAGS=1 UNCHANGED_BINARY_FLAGS=1 ren libbid.lib clang111blibbid.lib make clean LIBRARY/RUNLINUXINTEL64_GCC0000755€­ Q00042560000000015314616534611014725 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN LINUX..." rm *.a ./linuxbuild_gcc echo "END BUILDING LIBRARY IN LINUX..." LIBRARY/.directory_exists0000755€­ Q00042560000000000014616534611015467 0ustar aakkasintelallLIBRARY/RUNWINDOWSINTEL64_CLANG.bat0000755€­ Q00042560000000017414616534611016160 0ustar aakkasintelallecho "BEGIN BUILDING LIBRARY IN WINDOWS..." del *.lib call windowsbuild_clang.bat echo "END BUILDING LIBRARY IN WINDOWS..." LIBRARY/README0000755€­ Q00042560000000643614616721001012755 0ustar aakkasintelallTo build the Intel(R) Decimal Floating-Point Math Library V2.3 (Version 2, Update 3) (conforming to the IEEE Standard 754-2019 for Floating-Point Arithmetic) on processors that are implementations of the Intel(R) 64 Architecture: In Linux* with icx (Intel(R) Compiler 2024.2 or newer) or gcc: make clean make CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 - CC can be icx, icc, gcc, clang - CALL_BY_REF, GLOBAL_RND, GLOBAL_FLAGS, UNCHANGED_BINARY_FLAGS can be any of 0000, 0001, ... , 1111 Big-endian builds are possible, but will require additional command line parameters In Windows** with icx (Intel(R) C++ Compiler 2024.2 or newer) or cl (Microsoft Visual C++ Compiler**): nmake clean nmake -fmakefile.mak CC=icx CALL_BY_REF=0 GLOBAL_RND=0 GLOBAL_FLAGS=0 UNCHANGED_BINARY_FLAGS=0 - CC can be cl, icx, icl - CALL_BY_REF, GLOBAL_RND, GLOBAL_FLAGS, UNCHANGED_BINARY_FLAGS can be any of 0000, 0001, ... , 1111 - [g]make, which stands for a GNU make-compatible make program (e.g. make from cygwin) may also be used Note: The scripts and makefiles provided here may need adjustments, depending on the environment in which they are used; for example if moving files from Windows to Linux, running dos2unix on the Linux script files may be necessary. The makefiles currently support these environments, but can be extended to support additional ones: Linux, Windows, MacOS. See makefile.iml_head for more information. Note: ===== For some other operating systems and architecture combinations see the following command files, as well as any command files invoked from these ones: RUNWINDOWS_nmake.bat RUNOSXINTEL64 These command files build the Intel(R) Decimal Floating-Point Math Library, possibly using more than one compiler. Changes may be necessary in certain environments. Note that all the necessary versions of the Intel(R) Decimal Floating-Point Math Library have to be built in this directory prior to executing the similar RUN* command in either of ../LIBRARY/ or ../TESTS/. Check that all the expected *.a files (or *.lib in Windows) have been created prior to building and running the tests or examples from ../LIBRARY/ or ../TESTS/. Note: ===== If the makefile provided here is not used, the parameter passing method and local/global rounding mode and status flags may be selected by editing bid_conf.h: Parameter passing is determined by an environment variable in bid_conf.h: - by value: #define DECIMAL_CALL_BY_REFERENCE 0 - by reference: #define DECIMAL_CALL_BY_REFERENCE 1 Global variables are determined by two environment variables in bid_conf.h: - rnd_mode passed as parameter #define DECIMAL_GLOBAL_ROUNDING 0 - rnd_mode global #define DECIMAL_GLOBAL_ROUNDING 1 - status flags *pfpsf passed as parameter #define DECIMAL_GLOBAL_EXCEPTION_FLAGS 0 - status flags *pfpsf global #define DECIMAL_GLOBAL_EXCEPTION_FLAGS 1 For more information see ../README * Other names and brands may be claimed as the property of others. ** Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries LIBRARY/float128/0000755€­ Q00042560000000000014616534611013432 5ustar aakkasintelallLIBRARY/float128/dpml_cbrt_t_table.c0000644€­ Q00042560000001274514616534611017247 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #include "endian.h" #include "dpml_private.h" #if !defined(CBRT_TABLE_NAME) #define CBRT_TABLE_NAME __dpml_bid_cbrt_t_table #endif #if !TABLE_IS_EXTERNAL const unsigned int CBRT_TABLE_NAME[] = { /* 1.0 in double precision */ /* 000 */ DATA_1x2( 0x00000000, 0x3ff00000 ), /* coeffs to approx cbrt(f) */ /* 008 */ DATA_1x2( 0x39cf22de, 0x3fd929ac ), /* 016 */ DATA_1x2( 0x87730846, 0x3ff40b90 ), /* 024 */ DATA_1x2( 0xd7a3e848, 0xbff68ee7 ), /* 032 */ DATA_1x2( 0x80541937, 0x3ff6fbd1 ), /* 040 */ DATA_1x2( 0xdf5da719, 0xbff16685 ), /* 048 */ DATA_1x2( 0x1e6f904c, 0x3fe2d9c2 ), /* 056 */ DATA_1x2( 0xe1c69901, 0xbfcc5a29 ), /* 064 */ DATA_1x2( 0xda48850f, 0x3fac1c77 ), /* 072 */ DATA_1x2( 0xdc0d2f07, 0xbf8086db ), /* 080 */ DATA_1x2( 0x917c7fe0, 0x3f417636 ), /* coeffs to approx 1/cbrt(f)^2 */ /* 088 */ DATA_1x2( 0x881b89ed, 0x400e1506 ), /* 096 */ DATA_1x2( 0xc23a3b85, 0xc020ed35 ), /* 104 */ DATA_1x2( 0x7264fecc, 0x4029d89a ), /* 112 */ DATA_1x2( 0xae7a02bd, 0xc02a0b85 ), /* 120 */ DATA_1x2( 0xb037ffa0, 0x402196ed ), /* 128 */ DATA_1x2( 0xec98f3cd, 0xc00fa3c4 ), /* 136 */ DATA_1x2( 0x7b6b49c4, 0x3ff2394e ), /* 144 */ DATA_1x2( 0xc4f387a6, 0xbfc85e9d ), /* 152 */ DATA_1x2( 0x3810ace9, 0x3f8cce2f ), /* cube roots of 2^i, i = 0,1,2 in full and lo */ /* 160 */ DATA_1x2( 0x00000000, 0x3ff00000 ), /* 168 */ DATA_1x2( 0x00000000, 0x20000000 ), /* 176 */ DATA_1x2( 0xf98d728b, 0x3ff428a2 ), /* 184 */ DATA_1x2( 0xc4400000, 0x3deae515 ), /* 192 */ DATA_1x2( 0xa53d6e3d, 0x3ff965fe ), /* 200 */ DATA_1x2( 0x82b00000, 0x3dfd6e3c ), /* Numerical constants */ /* 208 */ DATA_1x2( 0x00000000, 0x42d00000 ), /* 216 */ DATA_1x2( 0x38e38e39, 0x3fe8e38e ), /* 224 */ DATA_1x2( 0x92492492, 0x3fc24924 ), /* 232 */ DATA_1x2( 0xb6db6db7, 0x3fe6db6d ), /* 240 */ DATA_1x2( 0x00000000, 0x402c0000 ), /* 248 */ DATA_1x2( 0x00000000, 0x401c0000 ), /* 256 */ DATA_1x2( 0x1c71c71c, 0x3fbc71c7 ), }; #else extern const TABLE_UNION CBRT_TABLE_NAME[66]; #endif #define ONE_D *((double *) ((char *)CBRT_TABLE_NAME + 0)) #define CBRT_POLY_ADDR ((double *) ((char *)CBRT_TABLE_NAME + 8)) #define REC_CBRT_POLY_ADDR ((double *) ((char *)CBRT_TABLE_NAME + 88)) #define OFFSET_OF_CBRTS_OF_2 160 #define BIG_QUAD *((double *) ((char *)CBRT_TABLE_NAME + 208)) #define SEVEN_NINTHS *((double *) ((char *)CBRT_TABLE_NAME + 216)) #define ONE_SEVENTH *((double *) ((char *)CBRT_TABLE_NAME + 224)) #define FIVE_SEVENTHS *((double *) ((char *)CBRT_TABLE_NAME + 232)) #define FOURTEEN *((double *) ((char *)CBRT_TABLE_NAME + 240)) #define SEVEN *((double *) ((char *)CBRT_TABLE_NAME + 248)) #define NINTH *((double *) ((char *)CBRT_TABLE_NAME + 256)) # define CBRT_POLY_M(x) ((((CBRT_POLY_ADDR[0]+x*CBRT_POLY_ADDR[1])+(x*x)*CBRT_POLY_ADDR[2])+(x*(x*x))*(CBRT_POLY_ADDR[3] \ +x*CBRT_POLY_ADDR[4]))+((x*x)*(x*(x*x)))*(((CBRT_POLY_ADDR[5]+x*CBRT_POLY_ADDR[6])+(x*x)*CBRT_POLY_ADDR[7]) \ +(x*(x*x))*(CBRT_POLY_ADDR[8]+x*CBRT_POLY_ADDR[9]))) # define CBRT_POLY_C(x) (CBRT_POLY_ADDR[0]+x*(CBRT_POLY_ADDR[1]+x*(CBRT_POLY_ADDR[2]+x*(CBRT_POLY_ADDR[3] \ +x*(CBRT_POLY_ADDR[4]+x*(CBRT_POLY_ADDR[5]+x*(CBRT_POLY_ADDR[6]+x*(CBRT_POLY_ADDR[7] \ +x*(CBRT_POLY_ADDR[8]+x*CBRT_POLY_ADDR[9]))))))))) # define CBRT_POLY SELECT_POLY(CBRT_POLY_) # define RECIP_CBRT_POLY_M(x) (((REC_CBRT_POLY_ADDR[0]+x*REC_CBRT_POLY_ADDR[1])+(x*x)*(REC_CBRT_POLY_ADDR[2]+x*REC_CBRT_POLY_ADDR[3])) \ +((x*x)*(x*x))*(((REC_CBRT_POLY_ADDR[4]+x*REC_CBRT_POLY_ADDR[5])+(x*x)*REC_CBRT_POLY_ADDR[6])+(x*(x*x))*(REC_CBRT_POLY_ADDR[7] \ +x*REC_CBRT_POLY_ADDR[8]))) # define RECIP_CBRT_POLY_C(x) (REC_CBRT_POLY_ADDR[0]+x*(REC_CBRT_POLY_ADDR[1]+x*(REC_CBRT_POLY_ADDR[2]+x*(REC_CBRT_POLY_ADDR[3] \ +x*(REC_CBRT_POLY_ADDR[4]+x*(REC_CBRT_POLY_ADDR[5]+x*(REC_CBRT_POLY_ADDR[6]+x*(REC_CBRT_POLY_ADDR[7] \ +x*REC_CBRT_POLY_ADDR[8])))))))) # define RECIP_CBRT_POLY SELECT_POLY(RECIP_CBRT_POLY_) LIBRARY/float128/dpml_ux_ops.c0000644€­ Q00042560000007216714616534611016144 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #include /* File: dpml_ux_ops.c */ /* ** Facility: ** ** DPML ** ** Abstract: ** ** This file contains source code for the basic operations used in the ** unpacked x-float library that are independent of word size: pack, ** unpack, addsub and normalize ** ** Modification History: ** ** 1-001 Version 1. RNH 01-Sep-95 ** 1-002 Made PACK and UNPACK take error information. RNH 17-Sep-95 ** 1-003 Adding missing return value in UNPACK2. RNH 09-May-98 ** 1-004 Modified FFS_AND_SHIFT and PACK to respect signed zeros; ** Changed the representation of the default connonical NaN ** RNH 29-Jun-07 ** */ #include "dpml_ux.h" /* Pick up packed constant table */ #undef INSTANTIATE_TABLE #undef INSTANTIATE_DEFINES #define INSTANTIATE_TABLE 1 #define INSTANTIATE_DEFINES 0 #include STR(DPML_UX_CONS_FILE_NAME) /* ** The FFS_AND_SHIFT routine finds the most significant non-zero bit in an ** UX_FLOAT value and aligns it with the MSB of a normalized UX_FLOAT value. ** The flags argument controls the interpretation of the input argument. ** If flags is one of FFS_CVT_WORD or FFS_CVT_U_WORD, then the high fraction ** digit is assumed to be signed of unsigned word and all other fields are ** assumed to be undefined. */ WORD FFS_AND_SHIFT ( UX_FLOAT * argument, U_WORD flags) { WORD shift, cshift, num_digits, cnt; UX_SIGN_TYPE sign; UX_EXPONENT_TYPE exponent; UX_FRACTION_DIGIT_TYPE msd, lsd OTHER_DIGITS; D_UNION u; msd = G_UX_MSD(argument); if (FFS_NORMALIZE == flags) { /* Do a quick check for a normalized argument */ exponent = G_UX_EXPONENT(argument); if ((UX_SIGNED_FRACTION_DIGIT_TYPE) msd < 0) return 0; } else { sign = 0; exponent = BITS_PER_UX_FRACTION_DIGIT_TYPE; if ((FFS_CVT_WORD == flags) && ((UX_SIGNED_FRACTION_DIGIT_TYPE) msd < 0)) { sign = UX_SIGN_BIT; msd = -msd; } P_UX_MSD(argument, msd); CLR_UX_LOW_FRACTION(argument); P_UX_SIGN(argument, sign); } lsd = G_UX_LSD(argument); G_UX_OTHER_DIGITS(argument); num_digits = NUM_UX_FRACTION_DIGITS; cnt = 0; do { if (msd) goto find_shift; DIGIT_SHIFT_FRACTION_LEFT(lsd, msd); cnt += BITS_PER_UX_FRACTION_DIGIT_TYPE; } while (--num_digits); /* ** If we get here, we had a zero fraction. Set the exponent field ** accordingly, force the sign to positive and return. */ P_UX_EXPONENT(argument, UX_ZERO_EXPONENT); P_UX_SIGN(argument, 0); return cnt; find_shift: /* Quick check to see if its already normalized */ if ((UX_SIGNED_FRACTION_DIGIT_TYPE) msd >= 0) { /* The high bit is not set, see if any of the next four are set */ shift = (msd >> (BITS_PER_UX_FRACTION_DIGIT_TYPE - 6)) & 0x1e; if (shift) /* Figure out which bit is set by "table look-up" */ shift = ((((3 << 2*1) | (2 << 2*2) | (2 << 2*3) | (1 << 2*4) | (1 << 2*5) | (1 << 2*6) | (1 << 2*7) ) >> shift) & 0x3) + 1; else /* ** Get shift by converting to floating point and extracting the ** then exponent field. In the 64 bit case, make sure there is ** rounding on the convert. */ { if (BITS_PER_UX_FRACTION_DIGIT_TYPE == 32) u.f = (double) msd; else { UX_FRACTION_DIGIT_TYPE itmp; itmp = msd & ~0xff; itmp = itmp ? itmp : msd; u.f = (double) itmp; } shift = (BITS_PER_UX_FRACTION_DIGIT_TYPE - 1 + D_EXP_BIAS - D_NORM) - (u.D_SIGNED_HI_WORD >> D_EXP_POS) ; } cshift = BITS_PER_UX_FRACTION_DIGIT_TYPE - shift; BIT_SHIFT_FRACTION_LEFT(lsd, msd, shift, cshift); cnt += shift; } P_UX_MSD(argument, msd); P_UX_OTHER_DIGITS(argument); P_UX_LSD(argument, lsd); P_UX_EXPONENT(argument, exponent - cnt); return cnt; } /* ** The ADDSUB routine add and/or subtracts two unpacked x-float values. ** The logic to determine the larger value is driven by the exponent fields ** only, so that it may be necessary to explicitly normalize the operands ** prior to calling ADDSUB. The flags argument allow for producing the ** sum, difference or both for signed or unsigned values */ #define DO_NORMALIZATION (2*NO_NORMALIZATION) void ADDSUB ( UX_FLOAT * x, UX_FLOAT *y, U_WORD flags, UX_FLOAT * result) { WORD shift, cshift, cnt, op, tmp1, tmp2; UX_FLOAT * ux_tmp, ux_save; UX_SIGN_TYPE sign; UX_EXPONENT_TYPE exponent; UX_FRACTION_DIGIT_TYPE msd, lsd, tmp_digit, carry OTHER_DIGITS; /* ** See if we are doing an implicit addition or subtraction. This logic ** depends upon ADD = 0 and SUB = 1 */ # if (ADD != 0) || (SUB != 1) # error "Must have ADD = 0 and SUB = 1" # endif sign = G_UX_SIGN(x); op = flags << (BITS_PER_UX_SIGN_TYPE - 1); tmp1 = (op^sign)^G_UX_SIGN(y); tmp2 = flags & MAGNITUDE_ONLY; sign = tmp2 ? 0 : sign; op = tmp2 ? op : tmp1; op = (op >> (BITS_PER_UX_SIGN_TYPE - 1)) & 1; /* ** Determine larger value, call it x and the smaller y. In the process ** keep track of whether or not a swap takes place so that we can get ** the correct sign of the second result on a combined operation */ exponent = G_UX_EXPONENT(x); shift = exponent - G_UX_EXPONENT(y); P_UX_SIGN(&ux_save, 0); if (shift < 0) { ux_tmp = x; x = y; y = ux_tmp; shift = -shift; exponent += shift; P_UX_SIGN(&ux_save, UX_SIGN_BIT); sign ^= ((op == ADD) ? 0 : UX_SIGN_BIT); } /* Now align digits of the smaller value */ lsd = G_UX_LSD(y); G_UX_OTHER_DIGITS(y); msd = G_UX_MSD(y); cnt = NUM_UX_FRACTION_DIGITS; do { cshift = BITS_PER_UX_FRACTION_DIGIT_TYPE - shift; if (cshift > 0) goto bit_shift; DIGIT_SHIFT_FRACTION_RIGHT(lsd, msd); shift = -cshift; } while (--cnt); /* ** If we get here, there was a *VERY* big alignment shift, so we ** copy the answer to the result */ UX_COPY(x, result); P_UX_SIGN(result, sign); if ((flags & 0x2)) { result++; UX_COPY(x, result); P_UX_SIGN(result, sign ^ G_UX_SIGN(&ux_save)); } return; bit_shift: if (shift) BIT_SHIFT_FRACTION_RIGHT(lsd, msd, shift, cshift); /* ** Save shifted value in case we are dealing with an ADD_SUB op */ P_UX_MSD(&ux_save, msd); P_UX_LSD(&ux_save, lsd); P_UX_OTHER_DIGITS(&ux_save); /* ** Now do the operation. The purpose of the do-loop is to ease processing ** of the ADD_SUB and SUB_ADD cases. */ do { tmp_digit = G_UX_LSD(x); if (op == ADD) { /* ** Addition code. Turn off normalization */ flags &= (DO_NORMALIZATION - 1); lsd += tmp_digit; carry = (lsd < tmp_digit); # if NUM_UX_FRACTION_DIGITS == 4 tmp_digit = G_UX_FRACTION_DIGIT(x, 2); _F2 += + carry; carry = (_F2 < carry); _F2 += tmp_digit; carry += (_F2 < tmp_digit); tmp_digit = G_UX_FRACTION_DIGIT(x, 1); _F1 += carry; carry = (_F1 < carry); _F1 += tmp_digit; carry += (_F1 < tmp_digit); # endif tmp_digit = G_UX_MSD(x); msd += carry; carry = (msd < carry); msd += tmp_digit; carry += (msd < tmp_digit); /* If carry is set, we need to normalizes fraction field */ if (carry) { BIT_SHIFT_FRACTION_RIGHT(lsd, msd, 1, BITS_PER_UX_FRACTION_DIGIT_TYPE - 1); msd |= UX_MSB; exponent++; } } else { /* ** Subtraction code. Set normalization flag */ flags -= NO_NORMALIZATION; carry = (lsd > tmp_digit); lsd = tmp_digit - lsd; # if NUM_UX_FRACTION_DIGITS == 4 tmp_digit = G_UX_FRACTION_DIGIT(x, 2); _F2 += carry; carry = (_F2 < carry); _F2 = tmp_digit - _F2; carry += (tmp_digit < _F2); tmp_digit = G_UX_FRACTION_DIGIT(x, 1); _F1 += carry; carry = (_F1 < carry); _F1 = tmp_digit - _F1; carry += (tmp_digit < _F1); # endif tmp_digit = G_UX_MSD(x); msd += carry; carry = (msd < carry); msd = tmp_digit - msd; carry += (tmp_digit < msd); /* ** If carry is set, we guessed wrong about which was larger so ** negate the result */ if (carry) { sign ^= UX_SIGN_BIT; P_UX_SIGN(&ux_save, UX_SIGN_BIT); lsd = -lsd; carry = (lsd == 0) ? 0 : -1; # if NUM_UX_FRACTION_DIGITS == 4 _F2 = carry - F2; carry = (_F2 == 0) ? carry : -1; _F3 = carry - F3; carry = (_F3 == 0) ? carry : -1; # endif msd = carry - msd; } } P_UX_MSD(result, msd); P_UX_LSD(result, lsd); P_UX_OTHER_DIGITS(result); P_UX_EXPONENT(result, exponent); P_UX_SIGN(result, sign); if (flags & DO_NORMALIZATION) NORMALIZE(result); if (0 == (flags & 0x2)) /* Single op. Quit now */ break; /* This is a dual op. Do the second part */ op = 1 - op; flags ^= 0x2; result ++; msd = G_UX_MSD(&ux_save); lsd = G_UX_LSD(&ux_save); G_UX_OTHER_DIGITS(&ux_save); sign ^= G_UX_SIGN(&ux_save); exponent = G_UX_EXPONENT(x); } while (1); } /* ** UNPACK_X_OR_Y unpacks one of two x-float arguments and handles any ** special FP classes based on the class_to_action_map. If the second ** argument, y, is non-zero, it is unpacked, otherwise the first argument is ** unpacked. */ #define SHIFT F_EXP_WIDTH #define CSHIFT (BITS_PER_UX_FRACTION_DIGIT_TYPE - F_EXP_WIDTH) /* ** The following macro changes/definitions are required to get the standard ** exception dispatcher interface macros to work */ #undef F_TYPE #define F_TYPE _X_FLOAT #undef P_EXCPTN_VALUE_x #define P_EXCPTN_VALUE_x(x,v) x.ld = *((F_TYPE *) v) #define TYPE_MASK MAKE_MASK(TYPE_WIDTH-1, TYPE_POS) #undef ADD_ERR_CODE_TYPE #define ADD_ERR_CODE_TYPE(e) (((F_TYPE_ENUM << TYPE_POS) & TYPE_MASK) \ | ((e) & (~TYPE_MASK))) WORD UNPACK_X_OR_Y( _X_FLOAT * packed_x, _X_FLOAT * packed_y, UX_FLOAT * unpacked_argument, U_WORD const * class_to_action_map, _X_FLOAT * packed_result OPT_EXCEPTION_INFO_DECLARATION) { WORD fp_class, sign, action, disp, index, action_index, map_element, shift, index_limit; UX_FRACTION_DIGIT_TYPE exponent_digit, cur_digit, next_digit, inf_nan, zero_denorm, fract_bits, * digit_ptr; UX_EXPONENT_TYPE exponent; _X_FLOAT * packed_value, * packed_argument, *tmp_ptr; EXCEPTION_RECORD_DECLARATION # if (BITS_PER_WORD == 32) WORD tmp; # endif /* ** Start unpacking the argument by fetching the exponent word and ** decomposing it into its sign, exponent and high fraction bit components */ index_limit = (packed_y != 0); packed_argument = index_limit ? packed_y : packed_x; IF_OPTNL_ERROR_INFO( EXCPTN_INFO->args[index_limit] = packed_argument); exponent_digit = G_X_DIGIT( packed_argument, 0); cur_digit = UX_MSB; P_UX_SIGN(unpacked_argument, (exponent_digit & cur_digit) >> (BITS_PER_UX_FRACTION_DIGIT_TYPE - BITS_PER_UX_SIGN_TYPE)); P_UX_EXPONENT(unpacked_argument, ((exponent_digit >> F_EXP_POS) & MAKE_MASK( F_EXP_WIDTH, 0)) - F_EXP_BIAS + 1); cur_digit |= (exponent_digit << SHIFT); /* ** Now get the remaining fraction bits, align them, and put them into ** the unpacked argument. While we're fetching the fraction bits, generate ** the logical 'or' of all of them to be used for the classification ** logic later on. */ next_digit = G_X_DIGIT( packed_argument, 1); fract_bits = cur_digit + cur_digit; cur_digit |= (next_digit >> CSHIFT); P_UX_MSD( unpacked_argument, cur_digit); cur_digit = next_digit << SHIFT; fract_bits |= next_digit; # if (BITS_PER_WORD == 32) next_digit = G_X_DIGIT( packed_argument, 2); cur_digit |= (next_digit >> CSHIFT); P_UX_FRACTION_DIGIT( unpacked_argument, 1, cur_digit); cur_digit = next_digit << SHIFT; fract_bits |= next_digit; next_digit = G_X_DIGIT( packed_argument, 3); cur_digit |= (next_digit >> CSHIFT); P_UX_FRACTION_DIGIT( unpacked_argument, 2, cur_digit); cur_digit = next_digit << SHIFT; fract_bits |= next_digit; # endif P_UX_LSD( unpacked_argument, cur_digit); /* ** We've unpacked the argument, now start the classification process */ zero_denorm = exponent_digit - F_HIDDEN_BIT_MASK; inf_nan = exponent_digit + F_HIDDEN_BIT_MASK; sign = exponent_digit >> F_SIGN_BIT_POS; fp_class = F_C_POS_NORM; if ((WORD) (inf_nan ^ zero_denorm) < 0 ) { /* Input argument is +/-0, +/-denorm, +/- Infinity, [SQ]NaN */ if ((WORD) (zero_denorm ^ exponent_digit) < 0) { /* argument was +/- zero or +/- denorm */ if (!fract_bits) fp_class = F_C_POS_ZERO; else { /* denorm, undo hidden bit, adjust exponent and normalize */ P_UX_MSD(unpacked_argument, G_UX_MSD(unpacked_argument) - UX_MSB); UX_INCR_EXPONENT(unpacked_argument, 1); NORMALIZE(unpacked_argument); fp_class = F_C_POS_DENORM; } } else { /* argument was +/- Inf or [SQ]NaN */ if (!fract_bits) fp_class = F_C_POS_INF; else { /* NaN */ fp_class = F_C_SIG_NAN; sign = exponent_digit >> (F_EXP_POS - 1) & 1; } } } fp_class += sign; IF_OPTNL_ERROR_INFO( EXCPTN_INFO->arg_classes = (1 << fp_class)); /* Now get the class to action mapping index and action */ shift = fp_class*(INDEX_WIDTH + ACTION_WIDTH); # if (BITS_PER_WORD == 64) map_element = class_to_action_map[0]; action_index = map_element >> shift; disp = map_element >> 60; # else map_element = class_to_action_map[0]; tmp = class_to_action_map[1]; if (fp_class < F_C_NEG_NORM) action_index = map_element >> shift; else action_index = map_element >> (shift - F_C_NEG_NORM*(INDEX_WIDTH + ACTION_WIDTH)); disp = ((map_element >> 31) & 0x6) | ((tmp >> 29) & 0x18); # endif index = action_index & INDEX_MASK; action = (action_index >> ACTION_POS) & INDEX_MASK; /* Leave now if all we have to do is unpack the argument */ if (action == RETURN_UNPACKED) return fp_class; //printf("UNPACK %llx, %llx\n", (long long)fp_class, index); /* ** If index is not 0 or 1, then the base return value is in the class to ** action mapping table. Otherwise, the base value is the input argument ** or the auxiliary argument. */ if (index <= index_limit) { digit_ptr = (UX_FRACTION_DIGIT_TYPE *) (index == 0 ? packed_x : packed_y); } else { index = WORDS_PER_CLASS_TO_ACTION_MAP*(disp & 0xf) + index - 1; index = class_to_action_map[ index ]; digit_ptr = (UX_FRACTION_DIGIT_TYPE *) & ((_X_FLOAT *) PACKED_CONSTANT_TABLE)[index]; //printf("UNPACK 3 %llx, %llx d= %llx, %llx\n", (long long)fp_class, index, digit_ptr[0],digit_ptr[1]); } /* ** If this is an error action, process the exception and get the final ** return value from the exception handler. Otherwise, manipulate ** the base value to get the final return value. */ if (action == RETURN_ERROR) { index = ADD_ERR_CODE_TYPE(index); GET_EXCEPTION_RESULT_2(index, packed_x, packed_y, *packed_result); } else { exponent_digit = G_X_DIGIT(digit_ptr, MSD_NUM); switch (action) { case RETURN_QUIET_NAN: // exponent_digit &= (~SET_BIT(F_EXP_POS - 1)); exponent_digit |= (SET_BIT(F_EXP_POS - 1)); break; case RETURN_NEGATIVE: exponent_digit ^= (F_SIGN_BIT_MASK); break; case RETURN_ABSOLUTE: exponent_digit &= (~F_SIGN_BIT_MASK); break; case RETURN_CPYSN_ARG_0: exponent_digit = (exponent_digit & (~F_SIGN_BIT_MASK)) | (G_X_DIGIT(packed_x, MSD_NUM) & F_SIGN_BIT_MASK); break; case RETURN_VALUE: default: break; } /* Copy the final result to the packed result and return */ P_X_DIGIT(packed_result, MSD_NUM, exponent_digit); # if BITS_PER_WORD == 32 P_X_DIGIT(packed_result, 1, G_X_DIGIT(digit_ptr, 1)); P_X_DIGIT(packed_result, 2, G_X_DIGIT(digit_ptr, 2)); # endif P_X_DIGIT(packed_result, LSD_NUM, G_X_DIGIT(digit_ptr, LSD_NUM)); } return fp_class | ((WORD) 1 << (BITS_PER_WORD - 1)); } /* ** UNPACK2 is an interface layer that deals with processing the input ** arguments for 2 argument functions. Basicly, this routine call ** UNPACK_X_OR_Y twice and processes the more complicated class_to_action ** mappings associated with two argument functions. */ WORD UNPACK2( _X_FLOAT * packed_x, _X_FLOAT * packed_y, UX_FLOAT * unpacked_x, UX_FLOAT * unpacked_y, U_WORD const * class_to_action_map, _X_FLOAT * packed_result OPT_EXCEPTION_INFO_DECLARATION) { WORD fp_class_x, fp_class_y, disp, shift; IF_OPTNL_ERROR_INFO( U_WORD arg_classes; ) fp_class_x = UNPACK( packed_x, unpacked_x, class_to_action_map, packed_result OPT_EXCEPTION_INFO_ARGUMENT ); if (0 > fp_class_x) return fp_class_x; /* ** Check for NULL second argument. This allows for UNPACK2 to ** process single arguments if it has to */ if ( ! packed_y ) return fp_class_x; shift = F_C_CLASS_BIT_WIDTH*fp_class_x; # if (BITS_PER_WORD == 64) disp = (U_WORD) class_to_action_map[1]; # else if (fp_class_x < (BITS_PER_WORD/F_C_CLASS_BIT_WIDTH)) disp = (U_WORD) class_to_action_map[2]; else { disp = (U_WORD) class_to_action_map[3]; shift -= (BITS_PER_WORD/F_C_CLASS_BIT_WIDTH); } # endif disp = (disp >> (shift - 3)) & MAKE_MASK(F_C_CLASS_BIT_WIDTH, 3); IF_OPTNL_ERROR_INFO( arg_classes = EXCPTN_INFO->arg_classes ); fp_class_y = UNPACK_X_OR_Y( packed_x, packed_y, unpacked_y, (U_WORD *) ((char *) class_to_action_map + disp), packed_result OPT_EXCEPTION_INFO_ARGUMENT); IF_OPTNL_ERROR_INFO( EXCPTN_INFO->arg_classes |= arg_classes ); return fp_class_y | (fp_class_x << F_C_CLASS_BIT_WIDTH); } /* ** The PACK routine converts unpacked x-float arguments to packed and deals ** with any overflow, underflow or denorm conditions that might result. */ void PACK ( UX_FLOAT * unpacked_result, _X_FLOAT * packed_result, U_WORD underflow_error, U_WORD overflow_error OPT_EXCEPTION_INFO_DECLARATION ) { WORD shift, error_code; UX_FLOAT tmp; _X_FLOAT * x_ptr; UX_EXPONENT_TYPE exponent; UX_FRACTION_DIGIT_TYPE incr, tmp_digit, next_digit, current_digit, * error_val_ptr; EXCEPTION_RECORD_DECLARATION /* ** Start by "normalizing" any denormal results. Also screen out any ** (encoded) zeros, since they screw up the rest of the logic */ NORMALIZE(unpacked_result); exponent = G_UX_EXPONENT(unpacked_result); if (exponent == UX_ZERO_EXPONENT) { next_digit = G_UX_SIGN(unpacked_result); # if (NUM_UX_DIGITS == 4) packed_result->digit[2] = 0; packed_result->digit[3] = 0; # else next_digit <<= UX_SIGN_SHIFT; # endif //packed_result->digit[1] = 0; // packed_result->digit[0] = 0; P_X_DIGIT(packed_result, LSD_NUM, 0); P_X_DIGIT(packed_result, 0, next_digit); return; } shift = (F_MIN_BIN_EXP + 1) - exponent; if (shift > 0) { SET_UX_FRACTION_TO_HALF(&tmp); P_UX_EXPONENT(&tmp, exponent + shift); P_UX_SIGN(&tmp, G_UX_SIGN(unpacked_result)); ADDSUB(&tmp, unpacked_result, ADD, unpacked_result); /* ** We need to distinguish between zero, denoms and underflow here. ** In all cases, the fraction field will be correct. However, we ** need to adjust the exponent value to get the right exponent field. */ exponent = 1-F_EXP_BIAS; if ((shift > F_PRECISION) && (shift != -(UX_ZERO_EXPONENT - F_MIN_BIN_EXP - 1))) /* Force underflow */ exponent--; } /* Now round result and shift right */ incr = (UX_FRACTION_DIGIT_TYPE) 1 << (F_EXP_WIDTH - 1); tmp_digit = G_UX_LSD(unpacked_result); current_digit = tmp_digit + incr; incr = (current_digit < incr); current_digit >>= SHIFT; tmp_digit = G_UX_2nd_LSD(unpacked_result); next_digit = tmp_digit + incr; incr = (next_digit < incr); current_digit |= (next_digit << CSHIFT); P_X_DIGIT(packed_result, LSD_NUM, current_digit); current_digit = (next_digit >> SHIFT); # if (NUM_UX_DIGITS == 4) tmp_digit = G_UX_FRACTION_DIGIT(unpacked_result, 1); next_digit = tmp_digit + incr; incr = (next_digit < incr); current_digit |= (next_digit << CSHIFT); P_X_DIGIT(packed_result, 2, current_digit); current_digit = (next_digit >> SHIFT); tmp_digit = G_UX_FRACTION_DIGIT(unpacked_result, 0); next_digit = tmp_digit + incr; incr = (next_digit < incr); current_digit |= (next_digit << CSHIFT); P_X_DIGIT(packed_result, 1, current_digit); current_digit = (next_digit >> SHIFT); # endif /* ** At this point, all of the fraction bits except the most significant ** have been written to the destination. Current_digit holds the most ** significant fraction (correctly aligned) and incr = 1 iff the rounded ** fraction is 1. */ if (incr) { exponent++; current_digit = (UX_MSB >> SHIFT); } /* Finish packing and check for overflow and underflow */ /* ** Pack sign and exponent. Be careful to convert to UX_FRACTION_DIGIT_TYPE ** first. Adjust exponent to reflect hidden bit in fraction field */ tmp_digit = exponent + ((F_EXP_BIAS - 1) - 1); current_digit += (tmp_digit << (CSHIFT - 1)); next_digit = G_UX_SIGN(unpacked_result); current_digit |= (next_digit << UX_SIGN_SHIFT); P_X_DIGIT(packed_result, 0, current_digit); /* If no overflow or underflow, we're done */ if ( tmp_digit < (MAKE_MASK(F_EXP_WIDTH, 0) - 1) ) /* OK, no overflow or underflow. */ return; /* Check for denorm and overflow/underflow processing */ if ( ++tmp_digit == 0 ) { # define IEEE_SPECIAL_ENCODING_MASK ( (1 << F_C_QUIET_NAN) \ | (1 << F_C_SIG_NAN) \ | (1 << F_C_POS_INF) \ | (1 << F_C_NEG_INF) \ | (1 << F_C_POS_DENORM) \ | (1 << F_C_NEG_DENORM) ) # define INPUT_WAS_IEEE_SPECIAL_ENCODING \ ((EXCPTN_INFO->arg_classes & IEEE_SPECIAL_ENCODING_MASK) != 0) if ( IF_OPTNL_ERROR_INFO( INPUT_WAS_IEEE_SPECIAL_ENCODING || ) PROCESS_DENORMS ) return; } error_code = (exponent < 0) ? underflow_error : overflow_error; error_code = ADD_ERR_CODE_TYPE(error_code); GET_EXCEPTION_RESULT_2(error_code, packed_x, packed_y, *packed_result); } /* ** For very ill behaved polynomial evaluations, we introduce a "packed" form ** of the coefficients to be used by a less efficient evaluation routine ** that unpacks the coefficients and evaluates the polynomial via Horner's ** scheme by calling the add/sub and multiply routines. ** ** The special format used looks like: ** ** |<----------------------- 128 bits -------------------->| ** +---------------------------------------------------+---+-+ ** | Normalized Fraction | K |s| ** +---------------------------------------------------+---+-+ ** -->| w |<-- ** ** where K is a biased scale field of w-bits and s is the sign bit. The ** width of the biased scale factor, w, and the actual bias depends on the ** coefficients. In general it is 4 bits or less. ** ** In a Horner's scheme evaluation of degree n, involving the coefficients ** c(k) and an argument x, the basic iteration (ignoring the signs) is of ** the form: ** ** T(n) <-- c(n) ** T(k) <-- c(k) + x*T(k+1) for k = n-1, n-2, ..., 1, 0 ** ** If we consider the binary exponent and fraction fields of the c(k)'s ** separately we can write c[k] = 2^e(k)*f(k), where 1/2 <= f(k) < 1. We ** now define the k-th scale factor s(k) as ** ** s(0) = e(0) ** s(k) = e(k-1) - e(k) for k = 1, 2, ..., n ** ** Then consider the recursion: ** ** t(n) <-- 2^s(n)*f(n) ** t(k) <-- 2^s(k)*[f(k) + x*t(k+1)] for k = n-1, n-2, ..., 1, 0 ** ** It can be shown by induction that t(k) = T(k)/2^e(k) for k >= 1 and that ** t(0) = T(0). */ #if NUM_UX_FRACTION_DIGITS == 2 # define COPY_MIDDLE_DIGITS(coef, ux) #else # define COPY_MIDDLE_DIGITS(coef, ux) \ P_UX_FRACTION_DIGIT(ux, 2, coef->digits[1]); \ P_UX_FRACTION_DIGIT(ux, 1, coef->digits[2]) #endif #define UNPACK_COEF_TO_UX(coef, ux, mask, bias, scale, op) \ { \ UX_FRACTION_DIGIT_TYPE lsd; \ \ P_UX_FRACTION_DIGIT(ux, 0, coef->digits[LSD_NUM]); \ COPY_MIDDLE_DIGITS(coef, ux); \ lsd = coef->digits[0]; \ P_UX_FRACTION_DIGIT(ux, LSD_NUM, lsd & ~mask); \ op = lsd & 1; \ scale = (((lsd >> 1) & mask) - bias); \ } void EVALUATE_PACKED_POLY( UX_FLOAT * argument, WORD degree, FIXED_128 * coefs, U_WORD mask, WORD bias, UX_FLOAT * result) { WORD op; UX_EXPONENT_TYPE scale; UX_FLOAT tmp; P_UX_SIGN(&tmp, 0); P_UX_EXPONENT(&tmp, 0); UNPACK_COEF_TO_UX(coefs, result, mask, bias, scale, op); P_UX_SIGN(result, (op == ADD) ? 0 : UX_SIGN_BIT ); P_UX_EXPONENT(result, scale); while (--degree >= 0) { MULTIPLY(argument, result, result); NORMALIZE(result); coefs++; UNPACK_COEF_TO_UX(coefs, &tmp, mask, bias, scale, op); ADDSUB(result, &tmp, op, result); UX_INCR_EXPONENT(result, scale); } } #if !defined GROUP double D_GROUP_NAME( double x ) { return x; } #endif LIBRARY/float128/dpml_ux_bid.c0000644€­ Q00042560000010111714616534611016065 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #define BASE_NAME bid #include "dpml_ux.h" #if !defined(MAKE_INCLUDE) # include STR(BUILD_FILE_NAME) #endif /* ** For the arithmetic operation (add, sub, mul and divide) it is sometines ** necessary to return an invalid operation result as well and overflow and ** underflow. For the time being, this will be done mapping the error ocdes ** for these operations onto existing error codes. */ #define MUL_ZERO_BY_INF SQRT_OF_NEGATIVE #define DIV_ZERO_BY_ZERO SQRT_OF_NEGATIVE #define DIV_BY_ZERO_POS COT_OF_ZERO #define DIV_BY_ZERO_NEG LOG_OF_ZERO #define DIV_INF_BY_INF SQRT_OF_NEGATIVE #define ADD_PINF_TO_NINF SQRT_OF_NEGATIVE #define SUB_INF_FROM_INF SQRT_OF_NEGATIVE /****************************************************************************** * * Basic arithmetic operations * *****************************************************************************/ #undef F_ENTRY_NAME #define F_ENTRY_NAME F_MUL_NAME X_XX_PROTO(F_ENTRY_NAME, packed_result, packed_x, packed_y) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, MUL_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); MULTIPLY( &unpacked_x, &unpacked_y, &unpacked_result); PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), G_UX_SIGN(&unpacked_result) ? FMA_NEG_UNDERFLOW : FMA_POS_UNDERFLOW, G_UX_SIGN(&unpacked_result) ? FMA_NEG_OVERFLOW : FMA_POS_OVERFLOW OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_DIV_NAME X_XX_PROTO(F_ENTRY_NAME, packed_result, packed_x, packed_y) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, DIV_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); DIVIDE( &unpacked_x, &unpacked_y, 0, &unpacked_result); PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), G_UX_SIGN(&unpacked_result) ? FMA_NEG_UNDERFLOW : FMA_POS_UNDERFLOW, G_UX_SIGN(&unpacked_result) ? FMA_NEG_OVERFLOW : FMA_POS_OVERFLOW OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_ADD_NAME X_XX_PROTO(F_ENTRY_NAME, packed_result, packed_x, packed_y) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, ADDITION_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); ADDSUB( &unpacked_x, &unpacked_y, ADD, &unpacked_result); PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), G_UX_SIGN(&unpacked_result) ? FMA_NEG_UNDERFLOW : FMA_POS_UNDERFLOW, G_UX_SIGN(&unpacked_result) ? FMA_NEG_OVERFLOW : FMA_POS_OVERFLOW OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_SUB_NAME X_XX_PROTO(F_ENTRY_NAME, packed_result, packed_x, packed_y) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, SUBTRACTION_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); ADDSUB( &unpacked_x, &unpacked_y, SUB, &unpacked_result); PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), G_UX_SIGN(&unpacked_result) ? FMA_NEG_UNDERFLOW : FMA_POS_UNDERFLOW, G_UX_SIGN(&unpacked_result) ? FMA_NEG_OVERFLOW : FMA_POS_OVERFLOW OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_NEG_NAME X_X_PROTO(F_ENTRY_NAME, packed_result, packed_x) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK( PASS_ARG_X_FLOAT(packed_x), & unpacked_x, NEGATE_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_FABS_NAME X_X_PROTO(F_ENTRY_NAME, packed_result, packed_x) { WORD fp_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK( PASS_ARG_X_FLOAT(packed_x), & unpacked_x, FABS_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_ITOF_NAME X_I_PROTO(F_ENTRY_NAME, packed_result, i) { UX_SIGN_TYPE sign; UX_FRACTION_DIGIT_TYPE msd, mask; UX_EXPONENT_TYPE exponent, cnt; UX_FLOAT unpacked_result; DECLARE_X_FLOAT(packed_result) EXCEPTION_INFO_DECL #define ITOF_SHIFT (BITS_PER_UX_FRACTION_DIGIT_TYPE - 32) sign = 0; msd = i; if ( i == 0 ) { exponent = 0; } else { exponent = 32; cnt = 16; if ( (UX_SIGNED_FRACTION_DIGIT_TYPE) msd < 0 ) { msd = -msd; sign = 1; } mask = ((UX_FRACTION_DIGIT_TYPE) 0xffff0000) << ITOF_SHIFT; msd = ((UX_FRACTION_DIGIT_TYPE) msd) << ITOF_SHIFT; while ( cnt ) { if ( (mask & msd) == 0 ) { msd <<= cnt; exponent -= cnt; } cnt >>= 1; mask <<= cnt; } } UX_SET_SIGN_EXP_MSD(&unpacked_result, sign, exponent, msd); PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), /* Not Used */ 0, /* Not Used */ 0 OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #define LT 0 #define EQ 1 #define GT 2 #define UN 3 #define DO 4 #define NUM_CMP_BITS 3 #define CPACK(a,b,c,d,e,f,g,h,i,j) \ ( ((a) << (F_C_NEG_INF * NUM_CMP_BITS)) | \ ((b) << (F_C_NEG_NORM * NUM_CMP_BITS)) | \ ((c) << (F_C_NEG_DENORM * NUM_CMP_BITS)) | \ ((d) << (F_C_NEG_ZERO * NUM_CMP_BITS)) | \ ((e) << (F_C_POS_ZERO * NUM_CMP_BITS)) | \ ((f) << (F_C_POS_DENORM * NUM_CMP_BITS)) | \ ((g) << (F_C_POS_NORM * NUM_CMP_BITS)) | \ ((h) << (F_C_POS_INF * NUM_CMP_BITS)) | \ ((i) << (F_C_SIG_NAN * NUM_CMP_BITS)) | \ ((j) << (F_C_QUIET_NAN * NUM_CMP_BITS)) ) static U_INT_32 cmpTable[] = { /* -Inf -Nrm -Dnrm -Zero +Zero +Dnrm +Nrm +Inf SNaN QNaN) */ /* -----------------------------------------------------------------------*/ /* SNaN */ CPACK( UN, UN, UN, UN, UN, UN, UN, UN, UN, UN ), /* QNaN */ CPACK( UN, UN, UN, UN, UN, UN, UN, UN, UN, UN ), /* +Inf */ CPACK( GT, GT, GT, GT, GT, GT, GT, EQ, UN, UN ), /* -Inf */ CPACK( EQ, LT, LT, LT, LT, LT, LT, LT, UN, UN ), /* +Nrm */ CPACK( GT, GT, GT, GT, GT, GT, DO, LT, UN, UN ), /* -Nrm */ CPACK( GT, DO, LT, LT, LT, LT, LT, LT, UN, UN ), /* +Dnrm */ CPACK( GT, GT, GT, GT, GT, DO, LT, LT, UN, UN ), /* -Dnrm */ CPACK( GT, GT, DO, LT, LT, LT, LT, LT, UN, UN ), /* +Zero */ CPACK( GT, GT, GT, EQ, EQ, LT, LT, LT, UN, UN ), /* -Zero */ CPACK( GT, GT, GT, EQ, EQ, LT, LT, LT, UN, UN ), }; #if !defined(UX_CMP) # define UX_CMP __INTERNAL_NAME(ux_cmp__) #endif static int UX_CMP( WORD x_class, UX_FLOAT * unpacked_x, WORD y_class, UX_FLOAT * unpacked_y ) { UX_SIGN_TYPE sign; int i, order; WORD diff; order = (cmpTable[ x_class ] >> (NUM_CMP_BITS * y_class)) & MAKE_MASK(NUM_CMP_BITS,0); if ( order == DO ) { // Both arguments have the same sign diff = ((WORD) G_UX_EXPONENT(unpacked_x)) - ((WORD) G_UX_EXPONENT(unpacked_y)); if (diff == 0) { for (i = 0; i < NUM_UX_FRACTION_DIGITS; i++) { diff = G_UX_FRACTION_DIGIT(unpacked_x, i) - G_UX_FRACTION_DIGIT(unpacked_y, i); if ( diff != 0 ) break; } } sign = G_UX_SIGN( unpacked_x ); if ( diff > 0 ) { order = sign ? LT : GT; } else if ( diff < 0 ) { order = sign ? GT : LT; } else { order = EQ; } } return order; } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_CMP_NAME I_XXI_PROTO(F_ENTRY_NAME, packed_x, packed_y, predicate) { WORD fp_class, x_class, y_class; UX_FLOAT unpacked_x, unpacked_y, unpacked_result; int order; EXCEPTION_INFO_DECL _X_FLOAT dummy; fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, MUL_CLASS_TO_ACTION_MAP, &dummy OPT_EXCEPTION_INFO ); #define CLASS_MASK MAKE_MASK(F_C_CLASS_BIT_WIDTH,0); x_class = (fp_class >> F_C_CLASS_BIT_WIDTH) & CLASS_MASK; y_class = fp_class & CLASS_MASK; order = UX_CMP(x_class, &unpacked_x, y_class, &unpacked_y); return (predicate >> order) & 1; } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_NEXTAFTER_NAME X_XX_PROTO(F_ENTRY_NAME, packed_result, packed_x, packed_y) { WORD fp_class, x_class, y_class, order; UX_FLOAT unpacked_x, unpacked_y; UX_EXPONENT_TYPE exponent; EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) fp_class = UNPACK2( PASS_ARG_X_FLOAT(packed_x), PASS_ARG_X_FLOAT(packed_y), & unpacked_x, & unpacked_y, NEXTAFTER_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); x_class = (fp_class >> F_C_CLASS_BIT_WIDTH); y_class = fp_class & MAKE_MASK(F_C_CLASS_BIT_WIDTH,0); order = UX_CMP(x_class, &unpacked_x, y_class, &unpacked_y); // Create (denormalized) increment value CLR_UX_LOW_FRACTION( &unpacked_y ); P_UX_EXPONENT( &unpacked_y, G_UX_EXPONENT( &unpacked_x) ); if (order != EQ) { exponent = G_UX_EXPONENT( &unpacked_x); UX_SET_SIGN_EXP_MSD( &unpacked_y, order == LT ? 0 : UX_SIGN_BIT, exponent, 0); CLR_UX_LOW_FRACTION( &unpacked_y ); P_UX_LSD( &unpacked_y, 1 << 15 ); ADDSUB( &unpacked_x, &unpacked_y, ADD, &unpacked_x); } PACK( &unpacked_x, PASS_RET_X_FLOAT(packed_result), G_UX_SIGN(&unpacked_x) ? FMA_NEG_UNDERFLOW : FMA_POS_UNDERFLOW, G_UX_SIGN(&unpacked_x) ? FMA_NEG_OVERFLOW : FMA_POS_OVERFLOW OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #if defined(MAKE_INCLUDE) @divert -append divertText # undef TABLE_NAME START_TABLE; TABLE_COMMENT("Negate class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "NEGATE_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_NEGATIVE, 0)); TABLE_COMMENT("Fabs class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "FABS_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0)); TABLE_COMMENT("Nextafter class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "NEXTAFTER_CLASS_TO_ACTION_MAP"); /* Index 0: mapping for x */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) ); /* Index 1: class-to-index mapping */ PRINT_64_TBL_ITEM( CLASS_TO_INDEX( F_C_POS_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_ZERO, 2) + CLASS_TO_INDEX( F_C_POS_DENORM, 2) + CLASS_TO_INDEX( F_C_NEG_DENORM, 2) + CLASS_TO_INDEX( F_C_POS_NORM, 2) + CLASS_TO_INDEX( F_C_NEG_NORM, 2) + CLASS_TO_INDEX( F_C_POS_INF, 2) + CLASS_TO_INDEX( F_C_NEG_INF, 2) ); /* Index 2: y class-to-index mapping for x != SNaN or QNaN */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1) ); TABLE_COMMENT("Multiply class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "MUL_CLASS_TO_ACTION_MAP"); /* Index 0: mapping for x */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(5) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) ); /* Index 1: class-to-index mapping */ PRINT_64_TBL_ITEM( CLASS_TO_INDEX( F_C_POS_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_DENORM, 3) + CLASS_TO_INDEX( F_C_NEG_NORM, 3) + CLASS_TO_INDEX( F_C_POS_DENORM, 4) + CLASS_TO_INDEX( F_C_POS_NORM, 4) + CLASS_TO_INDEX( F_C_POS_INF, 5) + CLASS_TO_INDEX( F_C_NEG_INF, 5) ); /* Index 2: y class-to-index mapping for x = +/- zero */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(3) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 3: y class-to-index mapping for x = -norm or -denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(2) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 4: y class-to-index mapping for x = +norm or +denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 5: y class-to-index mapping for x = +/-Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); TABLE_COMMENT("Data for the above mapping"); PRINT_U_TBL_ITEM( /* data 2 */ MUL_ZERO_BY_INF ); TABLE_COMMENT("Divide class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "DIV_CLASS_TO_ACTION_MAP"); /* Index 0: mapping for x */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(5) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) ); /* Index 1: class-to-index mapping */ PRINT_64_TBL_ITEM( CLASS_TO_INDEX( F_C_POS_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_DENORM, 3) + CLASS_TO_INDEX( F_C_NEG_NORM, 3) + CLASS_TO_INDEX( F_C_POS_DENORM, 4) + CLASS_TO_INDEX( F_C_POS_NORM, 4) + CLASS_TO_INDEX( F_C_POS_INF, 5) + CLASS_TO_INDEX( F_C_NEG_INF, 5) ); /* Index 2: y class-to-index mapping for x = +/- zero */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(3) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_ERROR, 3) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 3) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 3: y class-to-index mapping for x = -norm or -denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(2) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 2) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_ERROR, 4) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 5) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_NEGATIVE, 2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 4: y class-to-index mapping for x = +norm or +denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 2) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_ERROR, 5) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 4) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 5: y class-to-index mapping for x = +/-Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 6) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_NEGATIVE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 6) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); TABLE_COMMENT("Data for the above mapping"); PRINT_U_TBL_ITEM( /* data 2 */ ZERO ); PRINT_U_TBL_ITEM( /* data 3 */ DIV_ZERO_BY_ZERO ); PRINT_U_TBL_ITEM( /* data 4 */ DIV_BY_ZERO_POS ); PRINT_U_TBL_ITEM( /* data 5 */ DIV_BY_ZERO_NEG ); PRINT_U_TBL_ITEM( /* data 6 */ DIV_INF_BY_INF ); TABLE_COMMENT("Addition class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "ADDITION_CLASS_TO_ACTION_MAP"); /* Index 0: mapping for x */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(5) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) ); /* Index 1: class-to-index mapping */ PRINT_64_TBL_ITEM( CLASS_TO_INDEX( F_C_POS_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_DENORM, 3) + CLASS_TO_INDEX( F_C_NEG_NORM, 3) + CLASS_TO_INDEX( F_C_POS_DENORM, 3) + CLASS_TO_INDEX( F_C_POS_NORM, 3) + CLASS_TO_INDEX( F_C_NEG_INF, 4) + CLASS_TO_INDEX( F_C_POS_INF, 5) ); /* Index 2: y class-to-index mapping for x = +/- zero */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(3) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 3: y class-to-index mapping for x = +/-norm or +/-denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(2) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 4: y class-to-index mapping for x = -Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 5: y class-to-index mapping for x = +Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); TABLE_COMMENT("Data for the above mapping"); PRINT_U_TBL_ITEM( /* data 2 */ ADD_PINF_TO_NINF ); TABLE_COMMENT("Subtraction class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "SUBTRACTION_CLASS_TO_ACTION_MAP"); /* Index 0: mapping for x */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(5) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) ); /* Index 1: class-to-index mapping */ PRINT_64_TBL_ITEM( CLASS_TO_INDEX( F_C_POS_ZERO, 2) + CLASS_TO_INDEX( F_C_NEG_ZERO , 2) + CLASS_TO_INDEX( F_C_NEG_DENORM, 3) + CLASS_TO_INDEX( F_C_NEG_NORM, 3) + CLASS_TO_INDEX( F_C_POS_DENORM, 3) + CLASS_TO_INDEX( F_C_POS_NORM, 3) + CLASS_TO_INDEX( F_C_NEG_INF, 4) + CLASS_TO_INDEX( F_C_POS_INF, 5) ); /* Index 2: y class-to-index mapping for x = +/- zero */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(3) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 3: y class-to-index mapping for x = +/-norm or +/-denorm */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(2) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_UNPACKED, 1) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_NEGATIVE, 1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 4: y class-to-index mapping for x = -Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); /* Index 5: y class-to-index mapping for x = +Inf */ PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_NORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 1) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 1)); TABLE_COMMENT("Data for the above mapping"); PRINT_U_TBL_ITEM( /* data 2 */ SUB_INF_FROM_INF ); PAD_IF_NEEDED(MP_BIT_OFFSET, 64); /* Print various powers of 2 */ TABLE_COMMENT("2^n, n = .5, 0, 24, 75, -24, -77 in double precision"); PRINT_R_TBL_VDEF_ITEM( "D_SQRT_TWO\t", sqrt(2)); PRINT_R_TBL_VDEF_ITEM( "D_ONE\t\t", 1); PRINT_R_TBL_VDEF_ITEM( "D_TWO_POW_24\t", bldexp(1, 24)); PRINT_R_TBL_VDEF_ITEM( "D_TWO_POW_75\t", bldexp(1, 75)); PRINT_R_TBL_VDEF_ITEM( "D_RECIP_TWO_POW_24", bldexp(1, -24)); PRINT_R_TBL_VDEF_ITEM( "D_RECIP_TWO_POW_77", bldexp(1, -77)); TABLE_COMMENT( "Rsqrt iteration (double precision) constants: 7/8 and 3/8"); PRINT_R_TBL_VDEF_ITEM( "D_SEVEN_EIGHTS", 7/8); PRINT_R_TBL_VDEF_ITEM( "D_THREE_EIGHTS", 3/8); TABLE_COMMENT("3 in unpacked format"); PRINT_UX_TBL_ADEF_ITEM( "UX_THREE\t\t", 3); END_TABLE; @end_divert @eval my $tableText; \ my $outText = MphocEval( GetStream( "divertText" ) ); \ my $defineText = Egrep( "#define", $outText, \$tableText ); \ $outText = "$tableText\n\n$defineText"; \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Definitions and constants square root " . \ "related routines", __FILE__ ); \ print "$headerText\n\n$outText\n"; #endif LIBRARY/float128/dpml_ux_radian_reduce.c0000644€­ Q00042560000012235114616534611020117 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #if defined(MAKE_INCLUDE) # define BASE_NAME rdx #elif !defined(DPML_UX_RDX_BUILD_FILE_NAME) # define DPML_UX_RDX_BUILD_FILE_NAME dpml_rdx_x.h #endif #include "dpml_ux.h" /* ** This file contains the code for performing radian argument reduction ** for unpacked x-float arguments. The code here is liberally borrowed ** from dpml_trig_reduce.c and assumes the existence of a file that contains ** the bits of 4/pi and appropriate definitions for accessing it. This ** file is denoted by FOUR_OVER_PI_BUILD_FILE_NAME in dpml_names.h. ** ** The reduction routine returns the reduced argument accurate to F_PRECISION + ** EXTRA_PRECISION and the quadrant (modulo 4) that contained the original ** argument. Special cases like infinites and NaN's are assumed to have been ** screened out prior to calling this routine. */ #if !defined(EXTRA_PRECISION) # define EXTRA_PRECISION 6 #endif #if !defined(MAKE_INCLUDE) //# undef FOUR_OVER_PI_BUILD_FILE_NAME # include STR(DPML_UX_RDX_BUILD_FILE_NAME) #endif #define DEFINES #include STR(FOUR_OVER_PI_BUILD_FILE_NAME) /* ** BASIC ALGORITHM: ** ---------------- ** ** Let z = x + octant*(pi/4). We want to produce ** ** y = rem( z, pi/2 ) ** ** or equivalently, ** ** Q = nint( z/(pi/2) ) ** y = z - Q*(pi/2) ** ** Note that the reduce argument is in "radians". For computational ** purposes, it is convenient to first obtain the reduced argument in ** cycles - i.e. compute y as ** ** c = z/(pi/2) ** Q = nint(c) ** w = c - Q ** y = w*(pi/2) ** ** If in the above calculations, we substitute x + octant*(pi/4) for x, we get ** ** c = x/(pi/2) + octant/2 ** Q = nint(c) ** w = c - Q ** y = w*(pi/2) ** ** Now, suppose instead of computing, c, Q and w, we compute c' = 2*c, Q' = 2*Q ** and w' = 2*w. Then the above becomes ** ** c' = x/(pi/4) + octant ** Q' = 2*nint(c'/2) ** w' = x/(pi/4) + octant - Q' ** y = w'*(pi/4) ** ** We see that the key operation is to compute x/(pi/4). With this in mind, ** let x = 2^n*f, where 1/2 <= f < 1 and f has P' ( = 128 ) significant bits. ** If F is defined as F = 2^P'*f, it follows that F is an integer. ** Now ** ** x/(pi/4) = x*(4/pi) ** = (2^n*f)*(4/pi) ** = [2^(n-P')]*[2^P'*f] *(4/pi) ** = [2^(n-P')]*F*(4/pi) ** = F*{2^(n-P')*(4/pi)} ** ** Suppose that we have stored a large bit string that represents the value ** of 4/pi, then we can obtain the value of 2^(n-P')*(4/pi) by moving the ** binary point in 4/pi by n-P' places. In particular, let ** ** 2^(n-P')*(4/pi) = J*8 + g ** ** That is, J is an integer formed from the first n-P'-3 bits of 4/pi and ** g is value formed by the remaining bits. It follows that ** ** x/(pi/4) = F*{2^(n-P')*(4/pi)} ** = F*(J*8 + g) ** = F*J*8 + F*g ** ** Note that we need only compute x/(pi/4) modulo 8. Since F and J are both ** integers, the above gives ** ** x/(pi/4) (mod 8) = (F*J*8 + F*g) (mod 8) ** = F*g (mod 8) ** ** At this point the algorithm for large argument reduction has the following ** flavor: ** ** (1) index into a precomputed bit string for 4/pi to ** obtain g ** (2) compute w = F*g (mod 8) ** (3) w <-- integer part of w + octant (mod 8) ** (4) Q <-- nint(w) ** (5) y = w - Q ** (6) y = y*(pi/4) ** ** Algorithm I ** ----------- ** ** The following sections describe the implementation issues associated with ** each of the steps in algorithm I as well as present the code for the ** overall implementation. ** ** ** THE 4/pi TABLE ** -------------- ** ** Step (1) of Algorithm I requires indexing into a bit string for 4/pi using ** the exponent field of the argument. Specifically, if n is the argument ** exponent we want to shift the binary point of 4/pi by n - P' bits to the ** right. If |x| < pi/4, there is no need to compute x/(pi/4), so we assume ** that we only index into the table if |x| >= 1/2. Under this assumption, ** it is possible that n - P' is negative. Thus to facilitate the indexing ** operation, it is necessary for the bit string to have some leading 0's. ** ** Assume the bit string for 4/pi has T leading zeros and that the bits are ** numbered in increasing order starting from 0. I.e. the string looks like: ** ** bit number: 0 T ** 00...001.01000101111..... ** ^ ** | ** binary point ** ** From the above discussion, we want to shift the binary point of the bit ** string n-P' bits to the right and extract g as some (as yet undetermined) ** number of bits, starting 3 bits to the left of the shifted binary point. ** Consequently, the position of the most significant bit we would like to ** access is k = T + n - P' - 2. Since we want the bit position to be greater ** than or equal to zero, and we are assuming that the argument is greater ** than or equal to 1/2 (i.e. n >= 0), it follows that T >= P' + 2. */ #if FOUR_OV_PI_ZERO_PAD_LEN < (UX_PRECISION + 2) # error "Insufficient zero padding in 4/pi table" #endif /* ** Since most architectures do not efficiently support bit addressing, the ** argument reduction routine assumes that the 4/pi bit string is stored ** in L-bit "digits". Getting the right bits of 4/pi requires getting the set ** of "digits" that begin with the digit that contains the leading bit and ** doing a sequence of shifts and logical ors. The index of the digit that ** contains the initial bit is trunc(n/L) and the bit position within that ** digit is n - L*trunc(n/L) = n % L. For the unpacked reduction routine, ** we require the 4/pi table "digit" and a UX_FRACTION_DIGIT have the same ** length (which implies the digit length is either 32 or 64 bits). */ #if (BITS_PER_DIGIT != BITS_PER_UX_FRACTION_DIGIT_TYPE) # error "Digit type mis-match" #endif #define DIGIT_MASK(width,pos) ((( DIGIT_TYPE_CAST 1 << (width)) - 1) << (pos)) #define DIGIT_BIT(pos) ( DIGIT_TYPE_CAST 1 << (pos)) #if defined(MAKE_COMMON) || defined(MAKE_INCLUDE) #define DIGIT_TYPE_CAST /* MPHOC doesn't do casts */ #else #define DIGIT_TYPE_CAST (DIGIT_TYPE) #endif #define DIV_REM_BY_L(n,q,r) (q) = (n) >> __LOG2(BITS_PER_DIGIT); \ (r) = (n) & (BITS_PER_DIGIT - 1) /******************************************************************************/ /* */ /* Generate code for multi-precision multiplication */ /* */ /******************************************************************************/ /* ** Many of the operation used in the radian reduction scheme depend on the ** digit size. The following code is used generate macros that hide the ** dependencies on digit size. */ #if defined(MAKE_INCLUDE) @divert -append divertText /* ** Record FOUR_OVER_PI_BUILD_FILE_NAME so we don't have to keep specifying ** it on the command line. */ printf("#if !defined FOUR_OVER_PI_BUILD_FILE_NAME\n"); printf("#define FOUR_OVER_PI_BUILD_FILE_NAME\t" STR(FOUR_OVER_PI_BUILD_FILE_NAME) "\n"); printf("#endif\n"); /* ** COMPUTING F*g ** ------------- ** ** The goal of step (2) in Algorithm I is to produce a reduced argument ** that is accurate to P + k bits, where k is the specified number of ** extra bits of precision. Also, we need to get the quadrant bits, Q. ** Consequently, the value of w = F*g, must be accurately computed to ** P + k + 3 bits. Note however, that if x is close to a multiple of ** pi/2 the reduced argument will have a large number of leading zeros ** (in fixed point) and consequently the actual number of required bits ** in w will depend upon the input argument. Since computing w is the ** most time consuming part of the algorithm, we would like to compute ** the minimum number of bits possible. Specifically, compute w to enough ** bits so that if x is not near a multiple of pi/2, then the reduced ** argument will be accurate. After w is computed, we can check how close ** the original argument was to pi/2 by examining the number of leading ** fractional 1's or 0's in w. If there are too many (i.e. the reduced ** argument will not have enough significant bits) then we can compute ** additional bits of w. ** ** In order to compute F*g to P + k + 3 bits, we must perform some form of ** extended precision arithmetic. For the sake of uniformity across data ** types and architectures, the implementation described here computes F*g ** by expressing F and g as fixed point values in "arrays" of some basic ** integer unit of computation. As indicated above, we shall refer to this ** integer unit as a digit. The choice of digit is arbitrary, however, it ** is best if the double length product of two digits is efficiently ** computed. ** ** Now we need to represent w to at least P + k + 3 bits. Since F has P' ** significant bits, if we use a finite precision approximation of g, call ** it g', then the last P' bits of the product F*g' are inaccurate. ** Therefore we need to represent g' to N = P' + P + k + 3 bits. If the ** number of bits in a digit is L, then F and g' must be represented in at ** least ceil(P'/L) and D = ceil(N/L) digits respectively. */ num_f_digits = ceil(UX_PRECISION/BITS_PER_DIGIT); num_req_bits = (F_PRECISION + UX_PRECISION + EXTRA_PRECISION + 3); num_w_digits = ceil(num_req_bits/BITS_PER_DIGIT); num_g_digits = num_w_digits; num_extra_bits = num_w_digits*BITS_PER_DIGIT - num_req_bits; printf("#define NUM_F_DIGITS\t%i\n", num_f_digits); printf("#define NUM_G_DIGITS\t%i\n", num_g_digits); printf("#define NUM_W_DIGITS\t%i\n", num_w_digits); printf("#define NUM_REQ_BITS\t%i\n", num_req_bits); printf("#define NUM_EXTRA_BITS\t%i\n", num_extra_bits); print; /* ** Now consider the computation of F*g' in terms of digits. For the ** purpose of discussion, suppose F requires 2 digits and g' requires 4 ** digits. ** Then using "black board" arithmetic F*g' looks like: ** ** binary point ** | ** | ** | ** +--------+--------+--------+--------+ ** g': | g1 | g2 | g3 | g4 | ** +--------+--------+--------+--------+ ** +--------+--------+ ** F: | F1 | F2 | ** +--------+--------+ ** ---------------------------------------------------------- ** | +--------+--------+ ** | | F2*g4 | ** | +--------+--------+--------+ ** | | F1*g4 | ** | +--------+--------+ ** | | F2*g3 | ** +--------+--------+--------+ ** | F1*g3 | ** +--------+--------+ ** | F2*g2 | ** +--------+--------+--------+ ** | F1*g2 | ** +--------+--------+ ** | F2*g1 | ** +--------+--------+--------+ ** | F1*g1 | | ** +--------+--------+ | ** | ** ---------------------------------------------------------- ** +--------+--------+--------+--------+--------+--------+ ** | Not required | w1 | w2 | w3 | w4 | ** +--------+--------+--------+--------+--------+--------+ ** ** Figure 1 ** -------- ** ** The high two digits of the product are not required since we are ** interested in the result modulo 8. ** ** In general the number of digits used to express g' will contain more ** than N bits. Let the number of bits in excess of N be M. Then if x is ** close to pi/2 and the number of leading fractional 0's or 1's in F*g' is ** less than M, F*g' still contains enough significant bits to return an ** accurate reduced argument. If we denote the 3 most significant bits ** of w1 as o, then x will be close to pi/2 if o is odd the bits below ** o are 1's or o is even and the bits below o are 0's. Therefore there ** will be loss of significance if w1 (in the picture above) has a binary ** representation of the form ** ** +----------------------+ ** |xx00000...00000xxxxxxx| ** +----------------------+ ** - or - ** +----------------------+ ** |xx11111...11111xxxxxxx| ** +----------------------+ ** |<-- M+2 -->| ** ** These two bit patterns can be detected by add and mask operations. ** ** Assuming that M+2 0's or 1's appear in w1, we know that there are not ** enough significant bits in w to guarantee the accuracy of the answer. ** Consequently, we need to generate more bits of w. This can be done by ** getting the next digit of g, computing the product of that digit with ** F and adding it into the previous value of w. This process can be ** repeated until there are a sufficient number of significant bits. Note ** that each additional digit of g will add one digit (L bits) of ** significance to w. ** ** If the processes of adding additional significant bits is implemented ** in a naive fashion, each time through the loop will require an ** additional digit of storage. Consider the situation where the first ** additional digit has been added to w and there are still insufficient ** significant bits for an accurate result. This means that there are at ** least M + L leading fractional 0's or 1's. Then w must have the form ** ** |<------------ D + 1 digits ---------->| ** +----------+----------+ +----------+ ** |xx########|######xxxx| ... |xxxxxxxxxx| ** +----------+----------+ +----------+ ** |<-- M+L+2 -->| ** ** where the #'s indicate a string of 0's or 1's. Since there are more ** than L consecutive 0's or 1's, we can compress the representation of w ** by one digit by removing L consecutive 0's or 1's from the first two ** digits of w. If this is done w will look like ** ** |<-------------- D digits ------------>| ** +----------+----------+ +----------+ ** |xx#####xxx|xxxxxxxxxx| ... |xxxxxxxxxx| ** +----------+----------+ +----------+ ** -->|M+2|<-- ** ** Which is the same as for when the first additional digit was added. ** It follows that we need storage for only D+1 digits of w and a counter ** indicating the number of additional digits that were added. ** ** To recap the above discussion, algorithm I is expanded as follows: ** ** (1) s <-- 0 ** (2) w <-- first D digits of F*g ** (3) if w has less than or equal to M leading fractional ** 0's or 1's, go to step 9 ** (4) add an additional digit of F*g to w ** (5) if w has less than L leading leading fractional 0's ** or 1's, go to step 9 ** (6) Compact w by removing L 0's or 1's ** (7) s <-- s + 1 ** (8) go to step 3. ** (9) o <-- high three bits of w ** (10) z' <-- w - nint(w) (taking into account what ** ever compaction took place, i.e. what the current ** value of s is.) ** (11) y = z*(pi/4) ** ** Algorithm II ** ------------ ** ** The above loop has two exits. An exit from step 3 yields an ** approximation to w containing D digits while an exit from step 5 ** contains D+1 digits. In the second case, there are fewer than L ** leading 0's and 1's and this implies that there are enough "good" bits ** in the first D digits to generate the return values. Consequently, ** from either exit, it is sufficient to use only the first D digits of w. ** ** The exposition above on the number of leading zeros was a little loose, ** in that for the general case, the leading zeros and ones may not always ** lie entirely in the first digit of w. In general, there can be as many ** as L-1 extra bits, in which case, we would need to examine both the ** first and second word of w. However, for the digit sizes we are ** considering combined with the number of extra bits we are returning, ** examining one digit will suffice. */ p = BITS_PER_DIGIT - (num_extra_bits + 4); if (p < 0) { printf("ERROR: mask spans two digits\n"); exit; } else { i = DIGIT_BIT(p); /* to 'add 1' at position p */ m = DIGIT_MASK(num_extra_bits + 1, p + 1); printf("#define W_HAS_M_BIT_LOSS\t" "(((MSD_OF_W + 0x%..16i) & 0x%..16i) == 0)\n", i, m); } /* ** DIGIT ARITHMETIC ** ---------------- ** ** In step (2) of Algorithm 2, we are computing the first D digits of the ** product F*g. From figure 1, we see that, (in general) we are computing ** a 2*L bit product and incorporating it into the sum of previously ** computed 2*L bit products. If we think of F, g and w as multi-digit ** integers with their digits numbered from least significant to most ** significant (starting at zero) and denoting the i-th digit of F by F(i) ** and the j-th digit of g by g(j), then the product in figure 1 can be ** obtained as follows: ** ** t = 0; ** for (i = 0; i < num_g_digits; i++) ** { ** for (j = 0; j < num_F_digits; j++) ** t = t + F[j]*g[i]*2^(j*L) ** w[i] = t mod 2^L; ** t = (t >> L); ** } ** ** Example 1 ** --------- ** ** Note that each time through the loop, t is accumulating the product ** g[i]*F plus "the high digits" of g[i-1]*F. It follows that t can be ** represented in (num_F_digits + 1) digits. ** ** If F contains n digits, then the sum in the above loops looks like: ** ** +--------+ +--------+--------+--------+--------+ +--------+ ** t: | t(n) | ... | t(j+3) | t(j+2) | t(j+1) | t(j) | ... | t(0) | ** +--------+ +--------+--------+--------+--------+ +--------+ ** +--------+--------+ ** + | F[j]*g[i] | ** +--------+--------+ ** -------------------------------------------------------------------- ** +--------+ +--------+--------+--------+--------+ +--------+ ** t: | t'(n) | ... | t'(j+3)| t'(j+2)| t'(j+1)| t'(j) | ... | t(0) | ** +--------+ +--------+--------+--------+--------+ +--------+ ** ** Note that t(0) through t(j-1) are unaffected and that t(j+2) through ** t(n) are affected only by the carry out when computing t'(j+1). It ** follows that if we keep the carry out of t'(j+1) as a separate quantity, ** then the addition in the inner loop only affects two digits of t. If ** we denote the separate carry by c(j), the picture on the next iteration ** of the loop (i.e. replace j by j+1) looks like: ** ** +--------+ +--------+--------+--------+--------+ +--------+ ** t: | t(n) | ... | t(j+3) | t(j+2) | t(j+1) | t(j) | ... | t(0) | ** +--------+ +--------+--------+--------+--------+ +--------+ ** +--------+--------+ ** | F(i)*g(j+1) | ** +--------+--------+ ** +--------+ ** + | c(j) | ** +--------+ ** -------------------------------------------------------------------- ** +--------+ +--------+--------+--------+--------+ +--------+ ** t': | t(n) | ... | t(j+3) | t'(j+2)| t'(j+1)| t(j) | ... | t(0) | ** +--------+ +--------+--------+--------+--------+ +--------+ ** +--------+ ** + | c(k+1) | ** +--------+ ** ** Figure 1 ** -------- ** ** The above gives rise to the notion of a multiply/add primitive that has 5 ** inputs and 3 output: ** ** Inputs: N, M the most and least significant digits ** of t that are being added to ** C the carry out from the previous mul/add ** A, B The two digits that are to be multiplied ** ** Outputs: C' The carry out of the final sum ** N',M' The updated values of N and M. ** ** Recalling that the number of bits per digit is denoted by L, the mul/add ** primitive is algebraicly defined by: ** ** s <-- (N + C)*2^L + A*B ** M' <-- s % 2^L ** N' <-- floor(s/2^L) % 2^L ** C' <-- floor(s/2^(2*L)) % 2^L ** ** Note that in example 1, there are several special cases of the mul/add ** macro which might be faster depending on the values of i and j: ** ** i and j Special case ** ------------------ --------------------------------- ** 1) i = 0, j = 0 N = M = C = 0, C' = 0 ** 2) i = 0, j < n-1 N = C = 0, C' = 0 ** 3) i = 0, j = n-1 N = C = 0, C' = 0 and N' not needed ** ** 4) i > 0, j = 0 C = 0 ** 5) i > 0, j < n-1 general case ** 6) i > 0, j = n-1 N = 0, C' not needed ** ** 7) i + j = n-2 C' not needed ** 8) i + j = n-1 C, N, C' and N' not needed ** ** Note that cases 3 and 7 are functionally identical. For purposes of ** this discussion we will use the mnemonic XMUL to refer to producing a ** 2*L-bit product from 2 L-bit digits and XADD/XADDC to refer to the ** addition of one 2*L-bit integer to another without/with producing a ** carry out. With this naming convention we denote the following 6 ** mul/add operations that correspond to the 6 special cases as follows: ** ** case mul/add operator name ** ---- --------------------- ** 1) XMUL(A,B, N',M') ** 2) XMUL_ADD(A,B,M,N',M') ** 3) MUL_ADD(A,B,M,M') ** 4) XMUL_XADDC(A,B,N,M,C',N',M') ** 5) XMUL_XADDC_W_C_IN(C,A,B,N,M,C',N',M') ** 6) XMUL_XADD_W_C_IN(N,M,C,A,B,C',N',M') ** ** [XMUL_XADD_W_C_IN is described with more parameters than are actually ** used.] ** [There are 8 cases, two of which are "functionally identical". That ** leaves 7 cases, but only 6 have a "mul/add operator name".] ** ** The mphoc code following these comments generates macros for computing ** the initial multiplication of F*g as a function of the number of digits ** in both F and g. It assumes that NUM_F_DIGITS <= NUM_G_DIGITS ** ** ** ** The description of digit arithmetic above indicates that we need ** NUM_F_DIGITS + 1 temporary locations to hold the intermediate products ** and sums plus one extra for dealing with carries. For adding ** additional digits of the product F*g, we need at least 3 temporary ** locations. */ num_t_digits = max(3, num_f_digits + 2); /* ** Print macros for declaring the appropriate number of digits */ # define PRINT_DECL_DEF(tag,name,k) \ /* define 'name'0 thru 'name''k-1' */ \ printf("#define " tag STR(name) "0"); \ for (i = 1; i < k; i++) printf(", " STR(name) "%i", i); \ printf("\n") PRINT_DECL_DEF("G_DIGITS\t", g, num_g_digits); PRINT_DECL_DEF("F_DIGITS\t", F, num_f_digits); PRINT_DECL_DEF("TMP_DIGITS\t", t, num_t_digits); # undef PRINT_DECL_DEF print; /* ** Print macros for referencing the most significant digits of F and g ** as well as declaring the high temporary as the carry digit. */ printf("#define MSD_OF_W\tg%i\n", num_w_digits - 1); printf("#define LSD_OF_W\tg%i\n", num_w_digits - 1 - num_f_digits); printf("#define SECOND_MSD_OF_W\tg%i\n", num_w_digits - 2); printf("#define CARRY_DIGIT\tt%i\n", num_t_digits - 1); print; /* ** GET_F_DIGITS(x) fetches the initial digits of f from x. We assume ** that num_f_digits has the same value as NUM_UX_FRACTION_DIGITS ** ** PUT_W_DIGITS(x) stores the result digits into an UX_FLOAT fraction ** field. */ if (num_f_digits != NUM_UX_FRACTION_DIGITS) { printf("ERROR: num_f_digits != NUM_UX_FRACTION_DIGITS\n"); exit; } # define sMAC2 "; \\\n\t" # define MAC2 " \\\n\t" # define MAC3 "\n\n" printf("#define GET_F_DIGITS(x)" ); for (i = 0; i < num_f_digits; i++) printf( sMAC2 "F%i = G_UX_FRACTION_DIGIT(x, %i)", NUM_UX_FRACTION_DIGITS - 1 - i, i); printf(MAC3); printf("#define PUT_W_DIGITS(x)" ); for (i = 0; i < num_f_digits; i++) printf( sMAC2 "P_UX_FRACTION_DIGIT(x, %i, g%i)", i, num_g_digits - 1 - i); printf(MAC3); /* ** NEGATE_W negates the high num_f_digits + 1 digits of w */ printf("#define NEGATE_W {" ); j = num_g_digits; for (i = 0; i <= num_f_digits; i++) { j--; printf( " \\\n\t" "g%i = ~g%i;", j, j); } printf( " \\\n\t" "g%i += 1; CARRY_DIGIT = (g%i == 0);", j, j); for (i = 1; i < num_f_digits; i++) { j++; printf(" \\\n\t" "g%i += CARRY_DIGIT; CARRY_DIGIT = (g%i == 0);", j, j); } printf(" \\\n\t" "g%i += CARRY_DIGIT; }\n\n", j + 1); /* ** GET_G_DIGITS_FROM_TABLE fetches the initial digits of g ** (and the extra_digit) from the table. */ printf("#define GET_G_DIGITS_FROM_TABLE(p, extra_digit)"); /* Better performance with DEC C -- don't auto-increment! */ for (i = num_g_digits - 1; i >= 0; i--) printf(MAC2 "g%i = p[%i]; ", i, num_g_digits - 1 - i); printf(MAC2 "extra_digit = p[%i]; ", num_g_digits); printf(MAC2 "p += %i", num_g_digits + 1); printf(MAC3); /* ** Generate macro that aligns g bits ** ** LEFT_SHIFT_G_DIGITS(lshift,BITS_PER_WORD-lshift,extra_digit) == ** g = (g << lshift) | (extra_digit >> (BITS_PER_WORD-lshift) **/ printf("#define LEFT_SHIFT_G_DIGITS(lshift, rshift, extra_digit)"); for (i = num_g_digits - 1; i > 0; i--) printf(MAC2 "g%i = (g%i << (lshift)) | (g%i >> (rshift));", i, i, i-1); printf(MAC2 "g0 = (g0 << (lshift)) | (extra_digit >> (rshift))"); printf(MAC3); /* ** MULTIPLY_F_AND_G_DIGITS(c) == g = F* g */ printf("#define MULTIPLY_F_AND_G_DIGITS(c)"); if (num_g_digits == 1) printf("\t" "g0 = F0*g0\n"); else if (num_f_digits == 1) { printf(MAC2 "XMUL(F0,g0,t0,g0)"); for (i = 1; i < num_w_digits - 1; i++) printf(sMAC2 "XMUL_ADD(F0,g%i,t0,t0,g%i)", i, i); printf(sMAC2 "MUL_ADD(F0,g%i,t0,g%i)", i, i); } else { /* Get first product */ printf(MAC2 "XMUL(g0,F0,t1,t0)"); /* ** Accumulate additional products until we use up all of the F ** digits, or we no longer need the high digit of the XMUL. */ msd_of_mul_add = 1; for (i = 1; i < num_f_digits; i++) { msd_of_mul_add++; if (msd_of_mul_add >= num_w_digits) break; printf(sMAC2 "XMUL_ADD(g0,F%i,t%i,t%i,t%i)", i, i, i+1, i); } /* ** If we no longer needed the high digit of the XMUL before using ** all of the F digits, add in the low bits of the final product. */ if (msd_of_mul_add >= num_w_digits) printf(sMAC2 "MUL_ADD(g0,F%i,t%i)", i, i); /* Move the low bits of t to w */ printf(sMAC2 "g0 = t0"); /* ** Now multiply by the remaining digits of g. In the code that ** follows, the digits of t are reused each time through the loop ** modulo (NUM_F_DIGITS + 1). For example, suppose NUM_F_DIGITS ** is 3. In the multiplications above, the digits of t (in most to ** least significant order were t[3]:t[2]:t[1]:t[0]. In the first ** iterations below the order is t[0]:t[3]:t[2]:t[1], and on the ** next iteration t[1]:t[0]:t[3]:t[2], and so on. The variables ** hi, lo and first are used to track the order of the digits and ** the least significant digit. Note that the high tmp digit is ** used as a carry digit. */ for (i = 0; i < num_t_digits - 1; i++) next_index[i] = i + 1; next_index[num_t_digits - 2] = 0; # define UPDATE_DIGIT_INDEX(lo,hi) lo = hi; hi = next_index[hi] first = 0; for (i = 1; i < num_w_digits; i++) { first = next_index[first]; lo = first; hi = next_index[lo]; msd_of_mul_add = i + 2; /* msd is the carry out */ if (msd_of_mul_add < num_w_digits) printf(sMAC2 "XMUL_XADDC(g%i,F0,t%i,t%i,c,t%i,t%i)", i, hi, lo, hi, lo); else if (msd_of_mul_add <= num_w_digits) printf(sMAC2 "XMUL_XADD(g%i,F0,t%i,t%i,t%i,t%i)", i, hi, lo, hi, lo); else printf(sMAC2 "MUL_ADD(g%i,F0,t%i,t%i)", i, lo, lo); UPDATE_DIGIT_INDEX(lo,hi); for (j = 1; j < num_f_digits; j++) { msd_of_mul_add++; if (msd_of_mul_add < num_w_digits) { if (j == (num_f_digits - 1)) printf(sMAC2 "XMUL_XADDC(g%i,F%i,c,t%i,c,t%i,t%i)", i, j, lo, hi, lo); else printf(sMAC2 "XMUL_XADDC_W_C_IN(g%i,F%i,t%i,t%i,c,c,t%i,t%i)", i, j, hi, lo, hi, lo); } else if (msd_of_mul_add <= num_w_digits) { if (j == (num_f_digits - 1)) printf(sMAC2 "XMUL_XADD(g%i,F%i,c,t%i,t%i,t%i)", i, j, lo, hi, lo); else printf(sMAC2 "XMUL_XADD_W_C_IN(g%i,F%i,t%i,t%i,c,t%i,t%i)", i, j, hi, lo, hi, lo); } else if (msd_of_mul_add <= num_w_digits + 1) { printf(sMAC2 "MUL_ADD(g%i,F%i,t%i,t%i)", i, j, lo, lo); } else break; UPDATE_DIGIT_INDEX(lo,hi); } /* Move low digit of t to W */ printf(sMAC2 "g%i = t%i", i, first); } } print; print; /* ** Generate the macro that multiplies F by an additional digit of g ** and adds the product to w. */ printf("#define GET_NEXT_PRODUCT(g, w, c)"); if (num_g_digits == 1) printf("\t" "XMUL_XADD(g,F0,g0,w,g0,w)"); else { printf(MAC2 "XMUL_XADDC(g,F0,g0,(DIGIT_TYPE)0,c,g0,w)"); msd_of_mul_add = 1; for (i = 1; i < num_f_digits; i++) { j = i-1; if (msd_of_mul_add < num_w_digits) printf(sMAC2 "XMUL_XADDC_W_C_IN(g,F%i,g%i,g%i,c,c,g%i,g%i)", i, i, j, i, j); else if (msd_of_mul_add <= num_w_digits + 1) printf(sMAC2 "XMUL_XADD_W_C_IN(g,F%i,g%i,g%i,c,g%i,g%i)", i, i, j, i, j); else if (msd_of_mul_add <= num_w_digits + 2) printf(sMAC2 "MUL_ADD(g,F%i,g%i,g%i)", i, j, j); else break; msd_of_mul_add++; } printf(";"); /* ** If there was a carry out on the last add and we are not past the ** last w digit, then the carry has to be propagated to the remaining ** w digits as necessary. */ if (msd_of_mul_add < num_w_digits) { if (msd_of_mul_add != (num_w_digits - 1)) { printf(MAC2 "if (c) "); i = msd_of_mul_add; while (i < num_w_digits - 1) printf(MAC2 "if (++g%i == 0) ", i++); printf(MAC2 "g%i++", i); } else printf(MAC2 "g%i += c", i); } } printf(MAC3); /* Generate the macro that shifts w left by 1 digit */ printf("#define LEFT_SHIFT_W_LOW_DIGITS_BY_ONE(extra_w_digit)"); if (num_w_digits != 1) { for (i = num_w_digits - 2; i > 0; i--) printf(MAC2 "g%i = g%i;", i, i-1); printf(MAC2 "g0 = extra_w_digit"); } printf(MAC3); print; @end_divert @eval my $outText = MphocEval( GetStream( "divertText" ) ); \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Definitions and constants for large " . \ "radian argument reduction",__FILE__ ); \ print "$headerText\n\n$outText"; #endif #define TMP_DIGIT t0 #define EXTRA_W_DIGIT t1 static U_WORD UX_RADIAN_REDUCE( UX_FLOAT * x, WORD octant, UX_FLOAT * reduced_argument ) { WORD offset, scale, j; UX_EXPONENT_TYPE exponent; UX_SIGN_TYPE sign, sign_x; DIGIT_TYPE quadrant; DIGIT_TYPE F_DIGITS; /* declare F0, ... Fm */ DIGIT_TYPE G_DIGITS; /* declare g0, ... gn */ DIGIT_TYPE TMP_DIGITS; /* declare t0, ... tm+1 */ DIGIT_TYPE next_g_digit; const DIGIT_TYPE *p; /* ** Get the fractional part of x into the fraction digits F. While */ GET_F_DIGITS(x); /* ** Assuming the input argument x has the form x = 2^n*f, where .5 <= f < 1, ** then F at this point is a multi-precision integer, F = 2^128*f ** ** Now, use the exponent to get the bit offset of the first interesting ** bit in the 4/pi table. */ exponent = G_UX_EXPONENT(x); sign_x = G_UX_SIGN(x); /* ** A negative offset would have us access memory before the start of ** the 4/pi table. This indicates that the x was pretty small already, ** so we'll make a quick exit. */ if (exponent < 0) { /* ** At this point the argument has absolute value less than pi/4. ** We need to compute the quadrant bits based on octant and possibly ** adjust x by a +/- pi/4. ** ** If x < 0, then x + octant lies in octant - 1, not octant. */ j = octant + (sign_x >> (BITS_PER_UX_SIGN_TYPE - 1)); /* ** We can now get actual quadrant by looking a the parity of effective ** octant. Depending on whether we round up or down, we might need ** to adjust x by +/- pi/4. */ j = j + (j & 1); quadrant = j >> 1; j = octant - j; if ( j ) ADDSUB(x, UX_PI_OVER_FOUR, j < 0 ? SUB : ADD, reduced_argument); else UX_COPY(x, reduced_argument); return quadrant; } /* ** Get the address of the digit containing the first interesting bit, ** and its bit offset within that digit. Load G from the the table, ** shifting the digits by that bit offset, so that the interesting bit ** will become the high bit of G. */ offset = exponent - ( UX_PRECISION + 2 - FOUR_OV_PI_ZERO_PAD_LEN ); DIV_REM_BY_L(offset, j, offset); p = &FOUR_OVER_PI_TABLE_NAME[j]; GET_G_DIGITS_FROM_TABLE(p, next_g_digit); if (offset) { j = BITS_PER_DIGIT - offset; LEFT_SHIFT_G_DIGITS(offset, j, next_g_digit); } /* ** The extended-precision multiply: w = F*g. */ MULTIPLY_F_AND_G_DIGITS( /* F_DIGITS, G_DIGITS, T_DIGITS, */ CARRY_DIGIT ); /* ** Add in the variable octant. */ octant = sign_x ? -octant : octant; MSD_OF_W += (DIGIT_TYPE)octant << (BITS_PER_DIGIT - 3); scale = 0; do { /* ** If there isn't enough significance in w, then: ** get more bits from the table, form the new digit into TMP_DIGIT, ** and add the partial product F*TMP_DIGIT to w. */ if ( !W_HAS_M_BIT_LOSS ) break; TMP_DIGIT = next_g_digit; next_g_digit = *p++; if (offset) TMP_DIGIT = (TMP_DIGIT << offset) | (next_g_digit >> j); GET_NEXT_PRODUCT(TMP_DIGIT, EXTRA_W_DIGIT, CARRY_DIGIT); /* ** We're done if the there are fewer than L bits of 0's or 1's. */ TMP_DIGIT = ( SECOND_MSD_OF_W >> (BITS_PER_DIGIT - NUM_EXTRA_BITS - 3)) | (MSD_OF_W << (NUM_EXTRA_BITS + 3)); TMP_DIGIT ^= ((SIGNED_DIGIT_TYPE) TMP_DIGIT >> (BITS_PER_DIGIT - 1)); if ( TMP_DIGIT ) break; /* ** Compress the current value of w and increment scale to reflect ** the compression */ # define OCTANT_MASK MAKE_MASK(3, BITS_PER_DIGIT - 3) MSD_OF_W = (MSD_OF_W & OCTANT_MASK) | (SECOND_MSD_OF_W & ~OCTANT_MASK); LEFT_SHIFT_W_LOW_DIGITS_BY_ONE(EXTRA_W_DIGIT); EXTRA_W_DIGIT = 0; scale += BITS_PER_DIGIT; } while (1); /* ** "Sign extend" w and get the quadrant. In the process, if the MSD_OF_W ** is "all" 0's or 1's, we need to shift up one digit in order to insure ** the proper number of significant bits in the final result. */ quadrant = MSD_OF_W; MSD_OF_W = MSD_OF_W << 2; MSD_OF_W = ((SIGNED_DIGIT_TYPE) MSD_OF_W) >> 2; TMP_DIGIT = MSD_OF_W; quadrant -= MSD_OF_W; if ( MSD_OF_W == ((SIGNED_DIGIT_TYPE) MSD_OF_W >> (BITS_PER_DIGIT - 1)) ) { MSD_OF_W = SECOND_MSD_OF_W; LEFT_SHIFT_W_LOW_DIGITS_BY_ONE(EXTRA_W_DIGIT); scale += BITS_PER_DIGIT; } /* ** If the sign bit of the original MSD of w is set, then "negate" the ** result */ sign = ((SIGNED_DIGIT_TYPE) TMP_DIGIT) < 0 ? UX_SIGN_BIT : 0; if (sign) NEGATE_W /* ** Put w into unpacked format and normalize. Make up for any zero bits ** that were shift in during the normalization. Note that by the way the ** reduced argument was constructed, normalization shift cannot be bigger ** than the digit size. */ quadrant = G_UX_SIGN(x) ? -quadrant : quadrant; P_UX_SIGN(reduced_argument, sign ^ sign_x); P_UX_EXPONENT(reduced_argument, 3); PUT_W_DIGITS(reduced_argument); NORMALIZE(reduced_argument); exponent = G_UX_EXPONENT(reduced_argument); offset = exponent - 3; if (offset) { offset += BITS_PER_DIGIT; TMP_DIGIT = G_UX_LSD( reduced_argument); TMP_DIGIT |= (LSD_OF_W >> offset); P_UX_LSD(reduced_argument, TMP_DIGIT); } P_UX_EXPONENT(reduced_argument, exponent - scale); MULTIPLY(reduced_argument, UX_PI_OVER_FOUR, reduced_argument); return quadrant >> (BITS_PER_DIGIT - 2); } LIBRARY/float128/dpml_sqrt.c0000644€­ Q00042560000007747514616534611015627 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #if defined(FAST_SQRT) + defined(SQRT) + defined(RSQRT) + defined(MAKE_INCLUDE) != 1 # error Exactly one of SQRT, FAST_SQRT, RSQRT, or MAKE_INCLUDE must be defined. #endif #if defined(FAST_SQRT) # define __ENTRY_NAME F_FAST_SQRT_NAME # define ___BASE_NAME FAST_SQRT_BASE_NAME # define IF_SQRT(x) x # define IF_RSQRT(x) #elif defined(RSQRT) # define __ENTRY_NAME F_RSQRT_NAME # define ___BASE_NAME RSQRT_BASE_NAME # define IF_SQRT(x) # define IF_RSQRT(x) x #elif defined(SQRT) # define __ENTRY_NAME F_SQRT_NAME # define ___BASE_NAME SQRT_BASE_NAME # define IF_SQRT(x) x # define IF_RSQRT(x) #endif #if !defined(F_ENTRY_NAME) # define F_ENTRY_NAME __ENTRY_NAME #endif #if !defined(BASE_NAME) # define BASE_NAME ___BASE_NAME #endif #if !defined(BUILD_FILE_EXTENSION) # define BUILD_FILE_EXTENSION c #endif #include "dpml_private.h" #if (DYNAMIC_ROUNDING_MODES) || (COMPILER == epc_cc) # define ESTABLISH_ROUND_TO_ZERO(old_mode) \ INIT_FPU_STATE_AND_ROUND_TO_ZERO(old_mode) # define RESTORE_ROUNDING_MODE(old_mode) \ RESTORE_FPU_STATE(old_mode) #else # define ESTABLISH_ROUND_TO_ZERO(old_mode) # define RESTORE_ROUNDING_MODE(old_mode) #endif #if !defined(F_MUL_CHOPPED) /* This definition of F_MUL_CHOPPED is used for dynamic rounding modes and when no directed rounding is available. In the later case results will not be correctly rounded. */ # define F_MUL_CHOPPED(x,y,z) (z) = (x) * (y) #endif /* ** NUM_FRAC_BITS specifies the number of mantissa bits used for ** indexing the table (the table index also includes the low-order ** exponent bit). NUM_FRAC_BITS also affects the table size: ** ** sizeof(D_SQRT_TABLE_NAME) = (1 << (NUM_FRAC_BITS + 1)) ** * (2*sizeof(float)+sizeof(double)) */ #define NUM_FRAC_BITS 7 #define INDEX_MASK MAKE_MASK((NUM_FRAC_BITS + 1), 0) #if (IEEE_FLOATING) /* ** LOC_OF_EXPON is the bit offset within u.B_SIGNED_HI_32 of the ** low-order exponent bit of u.f, where u is a B_UNION. (We assume ** the highest bits of B_SIGNED_HI_32 hold the sign bit and exponent). ** ** From LOC_OF_EXPON, EXP_BITS_OF_ONE_HALF and HI_EXP_BIT_MASK are derived. */ # define LOC_OF_EXPON ((BITS_PER_LS_INT_TYPE - 1) - B_EXP_WIDTH) # define EXP_BITS_OF_ONE_HALF ((U_LS_INT_TYPE)(B_EXP_BIAS-B_NORM-1) << LOC_OF_EXPON) # define HI_EXP_BIT_MASK (MAKE_MASK(B_EXP_WIDTH-1, 1) << LOC_OF_EXPON) # define GET_SQRT_TABLE_INDEX(exp,index) \ index = (exp >> (LOC_OF_EXPON - NUM_FRAC_BITS)); \ index &= INDEX_MASK /* ** SAVE_EXP saves the exponent in a temporary so it can be used in ** the INPUT_IS_ABNORMAL macro */ # define SAVE_EXP(exp) save_exp = (exp) # define INPUT_IS_ABNORMAL \ ((U_LS_INT_TYPE)(save_exp-((LS_INT_TYPE)1 << LOC_OF_EXPON)) >= \ (U_LS_INT_TYPE)hi_exp_mask) #endif #if (VAX_FLOATING) # define EXP_BITS_OF_ONE_HALF 0x4000 # define HI_EXP_BIT_MASK 0x7fe0 # define GET_SQRT_TABLE_INDEX(exp,index) \ index = ((exp << 3) | ((U_INT_32)exp >> 29)); \ index &= INDEX_MASK # define SAVE_EXP(exp) /* INPUT_IS_ABNORMAL doesn't need it */ # define INPUT_IS_ABNORMAL (x <= (F_TYPE)0.0) #endif #if ((ARCHITECTURE == alpha) || (BITS_PER_WORD == 64)) /* We can do 64-bit stores */ /* This is an optimization of the 'else' clause below */ # if QUAD_PRECISION # define STORE_EXP_TO_V_UNION \ V_UNION_128_BIT_STORE # else # define STORE_EXP_TO_V_UNION \ V_UNION_64_BIT_STORE # endif #else /* Store it in 32-bits pieces */ # if QUAD_PRECISION # define STORE_EXP_TO_V_UNION \ v.B_SIGNED_HI_32 = ((U_INT_32)exp) >> 1; \ v.B_SIGNED_LO1_32 = 0; \ v.B_SIGNED_LO2_32 = 0; \ v.B_SIGNED_LO3_32 = 0 # else # define STORE_EXP_TO_V_UNION \ v.B_SIGNED_HI_32 = ((U_INT_32)exp) >> 1; \ v.B_SIGNED_LO_32 = 0 # endif #endif /* This condition is complicated. */ #if (VAX_FLOATING) == (ENDIANESS == little_endian) # define V_UNION_64_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_INT_64)(U_INT_32)exp) >> 1 # define V_UNION_128_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_INT_64)(U_INT_32)exp) >> 1; \ v.B_UNSIGNED_LO_64 = 0 #elif ((ARCHITECTURE == alpha) && defined(HAS_LOAD_WRONG_STORE_SIZE_PENALTY)) # define V_UNION_64_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_WORD)exp) >> 1 # define V_UNION_128_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_WORD)exp) >> 1; \ v.B_UNSIGNED_LO_64 = 0 #else # define V_UNION_64_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_INT_64)(U_INT_32)exp) << 31 # define V_UNION_128_BIT_STORE \ v.B_UNSIGNED_HI_64 = ((U_INT_64)(U_INT_32)exp) << 31; \ v.B_UNSIGNED_LO_64 = 0 #endif /* ** The definitions of SQRT_COEF_STRUCT and D_SQRT_TABLE_NAME also ** appear in the generated .c file for the table. */ typedef struct { float a, b; double c; } SQRT_COEF_STRUCT; extern const SQRT_COEF_STRUCT D_SQRT_TABLE_NAME[(1<<(NUM_FRAC_BITS+1))]; /* ** SCALE_AND_DO_INDEXED_POLY_APPROX ** ** Inputs: ** x any number ** = f * 2^(2*i+j) ** where 1/2 <= f < 1, integer i and j, ** and j = 0 or 1 ** ignoring f <= 0 ** ** Outputs: ** half_scale = 2^(i-1) (SQRT, F_SQRT) ** ** flah_scale = 2^(1-i) (RSQRT) ** (the name is clear, albeit cute) ** ** scaled_x = f * 2^j ** so 1/2 <= scaled_x < 2 ** ** y ~= 1/sqrt(scaled_x) ** ** so sqrt(x) ~= y * scaled_x * 2 * half_scale ** and 1/sqrt(x) ~= y / (2 * half_scale) ** ** Temporaries: ** u, a, b, c, index */ #define SCALE_AND_DO_INDEXED_POLY_APPROX \ u.f = (B_TYPE)x; \ exp = u.B_HI_LS_INT_TYPE; \ B_COPY_SIGN_AND_EXP((B_TYPE)x, half, y); \ ASSERT( ((0.5 <= y) && (y < 1.0)) ); \ GET_SQRT_TABLE_INDEX(exp,index); \ b = (B_TYPE)D_SQRT_TABLE_NAME[index].b; \ b *= y; \ c = (B_TYPE)D_SQRT_TABLE_NAME[index].c; \ lo_exp_bit_and_hi_frac = exp & ~hi_exp_mask; \ u.B_HI_LS_INT_TYPE = (exp_of_one_half | lo_exp_bit_and_hi_frac); \ c += b; \ scaled_x = u.f; \ ASSERT( (((0.5 <= scaled_x) && (scaled_x < 2.0)) || (scaled_x < 0.0)) ); \ y *= y; \ a = (B_TYPE)D_SQRT_TABLE_NAME[index].a; \ SAVE_EXP(exp); \ IF_SQRT ({ \ exp ^= lo_exp_bit_and_hi_frac; \ exp += exp_of_one_half; \ }) \ IF_RSQRT({ \ exp ^= lo_exp_bit_and_hi_frac; \ exp = 3*exp_of_one_half - exp; \ }) \ y *= a; \ STORE_EXP_TO_V_UNION; \ y += c; \ IF_SQRT ( half_scale = v.f ); \ IF_RSQRT( flah_scale = v.f ); \ /* end of SCALE_AND_DO_INDEXED_POLY_APPROX */ /*----------------------------------------------------------------------------*/ /* Tuckerman's Rounding */ /*----------------------------------------------------------------------------*/ /* ** Tuckerman's rounding is used to compute the correctly rounded sqrt(x). ** It's 'good to the last bit', or more precisely 'to within 1/2 lsb(sqrt(x))'. ** This is a short proof of Tuckerman's rounding. ** ** Let z be a machine-precision approximation to sqrt(x); then z+lsb(z) is the ** smallest representable number larger than z (NB: z-lsb(z) is the largest ** representable number less than z, _except_ when z is a power of 2). ** Within this proof, let [] represent _truncation_ to machine precision, ** and {} represent _rounding_ to machine precision. ** ** Note that for _any_ y (not necessarily representable in machine precision), ** ** z + 1/2 lsb(z) <= y <==> z < {y}. ** ** For sqrt(x), we never have equality: ** z + 1/2 lsb(z) <= sqrt(x) ==> z + 1/2 lsb(z) < sqrt(x), ** because if they were equal, we'd have: ** (z + 1/2 lsb(z))^2 = x ** which is impossible, because to represent the left hand side requires more ** than twice the machine precision, while the right hand side is representable. ** ** Now the following statements are equivalent in turn: ** ** z < {sqrt(x)} ** z + 1/2 lsb(z) <= sqrt(x) ** z + 1/2 lsb(z) < sqrt(x) ** (z + 1/2 lsb(z))^2 < x ** z (z + 1/2 lsb(z)) < x (the reverse is proved below) ** [ z (z + 1/2 lsb(z)) ] < x. ** ** To complete the reverse of the third inference above, suppose it were false. ** Then: z (z + 1/2 lsb(z)) < x <= (z + 1/2 lsb(z))^2. The left hand side is ** some multiple of 1/2 lsb(z)^2. The right hand side is only larger by ** d = 1/4 lsb(z)^2, so [rhs] = [rhs-d] = [lhs]. But the inequality implies ** [lhs] < x <= [rhs], and we have a contradiction. ** ** In conclusion, ** z < {sqrt(x)} <==> [ z (z + 1/2 lsb(z)) ] < x. */ /* ** Here we cover another question: How closely must y approximate sqrt(x) to ** ensure {y} = {sqrt(x)}, where x is a representable number? We state without ** proof that the closest sqrt(x) approaches a value halfway between consecutive ** representable numbers occurs either when x is just larger than a power of 4, ** or just less than a power of 4. We have: ** ** sqrt(4^k*(1+lsb( 1 ))) = 2^k*(1 + lsb( 1 )/2 - lsb( 1 )^2/8 + ...), and ** sqrt(4^k*(1-lsb(1/2)) = 2^k*(1 - lsb(1/2)/2 - lsb(1/2)^2/8 - ...). ** ** So if |y - sqrt(x)| < lsb(sqrt(x))^2/8 - O(lsb^3), {y} = {sqrt(x)}. ** For our purposes, this means that 50-bit accuracy (barely) suffices to ** produce a correctly-rounded 24-bit result, since (2^(1-24))^2/8 = 2^(1-50). ** After our Newton's iteration, we have nearly 53-bit accuracy. All is well. */ /*----------------------------------------------------------------------------*/ /* Computing 'x+' and 'x-' */ /*----------------------------------------------------------------------------*/ /* ** For Tuckerman's rounding, we need to compute the (machine-)representable ** numbers just after and before a representable x: 'x+' = x + lsb(x) and ** 'x-' = x - lsb(x-lsb(x)). Letting '{}' denote rounding to machine precision, ** we compute these by: ** ** 'x+' = {x + {c x}} (1) ** 'x-' = {x - {c x}} (2) ** ** for some appropriate constant c, where neither x+{c x} nor x-{c x} are midway ** between two consecutive representable numbers. ** ** The weakest preconditions that satisfy the above are: ** ** 1/2 lsb(x) < {c x} < 3/2 lsb(x) (1a), when x != 2^n(1-lsb(1/2)) ** 1/2 lsb(x) < {c x} < 2 lsb(x) (1b), when x = 2^n(1-lsb(1/2)) ** 1/2 lsb(x) < {c x} < 3/2 lsb(x) (2a), when x != 2^n ** 1/4 lsb(x) < {c x} < 3/4 lsb(x) (2b), when x = 2^n ** ** For (1a), (1b), and (2a), we can take: ** ** 1/2 lsb(x)/x < c < 3/2 lsb(x)/x, which we can 'shrink' to simplify: ** 1/2 lsb(1)/1 < c < 3/2 lsb(1)/2 ** 1/2 lsb(1) < c < 3/4 lsb(1) ** ** For (2b), we require: ** ** 1/4 lsb(1) < c < 3/4 lsb(1) ** ** Thus, in any case, we can use any c in the range: ** ** 1/2 lsb(1) < c < 3/4 lsb(1) ** ** We choose the midpoint: ** ** c = 5/8 lsb(1) = 5/8 2^(1-p) = 5/4 2^(-p) ** ** FWIW: It's possibly to compute 'x-' by: 'x-' = {x * (1-lsb(1/2))}, ** but 'x+' isn't necessarily computed by: 'x+' = {x * (1+lsb(1))}. */ #if defined(SQRT) # if (F_PRECISION == 24) # define ULP_FACTOR (F_TYPE)7.450580596923828125e-8 # elif (F_PRECISION == 53) # define ULP_FACTOR (F_TYPE)1.387778780781445675529539585113525390625e-16 # elif (F_PRECISION == 56) # define ULP_FACTOR (F_TYPE)1.7347234759768070944119244813919067382813e-17 # elif (F_PRECISION == 113) # define ULP_FACTOR (F_TYPE)1.203706215242022408159986214115579574086314e-34 # else # define ULP_FACTOR (F_TYPE)1.25/(F_POW_2(F_PRECISION)) # endif #endif /*----------------------------------------------------------------------------*/ /* Newton's Iteration */ /*----------------------------------------------------------------------------*/ /* Newton's iteration for 1 / (nth root of x) is: y' = y + [ (1 - x * y^n) * y / n ] So, the iteration for 1 / sqrt(x) is: y' = y + [ (1 - x * y^2) * y * 0.5 ] If we want to do one iteration, multiply the result by x, and multiply the result by a scale factor we get: y' = scale * x * ( y + [ (1 - x * y^2) * y * 0.5 ] ) y' = scale * x * y * ( 1 + [ (1 - x * y^2) * 0.5 ] ) y' = scale/2 * x * y * ( 2 + [ (1 - x * y^2) ] ) gives about 5/4 lsb error y' = scale/2 * x * y * ( 3 - x * y^2 ) gives about 8/4 lsb error So iterate to get better 1/sqrt(x) and multiply by x to get sqrt(x). */ /* ** For quad precision, we need additional Newton's iterations. ** For lower precisions, the iteration (if needed) is embedded ** in the ITERATE_AND_MAYBE_CHECK_LAST_BIT macro. */ #if QUAD_PRECISION /* ** NEWTONS_ITERATION ** ** Inputs: ** scaled_x any number ** ignoring scaled_x <= 0 ** ** y ~= 1/sqrt(scaled_x) ** ** Outputs: ** y ~= 1/sqrt(scaled_x) ** y becomes a better approximation ** ** Temporaries: ** a, b, c */ # define NEWTONS_ITERATION \ a = y * scaled_x; \ b = a * y; \ b = one - b; \ b *= y; \ c = y + y; \ c += b; \ y = c * half #else # define NEWTONS_ITERATION #endif /*----------------------------------------------------------------------------*/ /* ITERATE_AND_MAYBE_CHECK_LAST_BIT */ /*----------------------------------------------------------------------------*/ #if 0 /* To make all arms 'elif's */ #elif FAST_SQRT && (F_PRECISION <= 24) /* Don't do a Newton's iteration */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ a = y * scaled_x; \ b = half_scale + half_scale; \ f_type_y = (F_TYPE)(a * b) # define RESULT f_type_y #elif RSQRT && (F_PRECISION <= 24) /* Don't do a Newton's iteration */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ b = flah_scale + flah_scale; \ f_type_y = (F_TYPE)(y * b) # define RESULT f_type_y #elif SQRT && (F_PRECISION <= 24) && (B_PRECISION < 2*F_PRECISION) /* This case is unlikely enough that we will worry about it when we need to (if ever). There is code in older versions of sqrt that does a tuckermans rounding on single prec values. */ # error "We need to worry about it now." #elif SQRT && (F_PRECISION <= 24) && (B_PRECISION >= 2*F_PRECISION) /* Make sure the last bit is correctly rounded by computing a double-precision result, and then rounding it to single. */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ a = y * scaled_x; \ b = a * y; \ c = a * half_scale; \ b = three - b; \ f_type_y = (F_TYPE)(c * b) # define RESULT f_type_y #elif RSQRT /* Do more accurate iteration (about 1 lsb error) */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ c = y * flah_scale; \ f_type_y = (F_TYPE)((c+c)+c*(one-scaled_x*(y*y))); # define RESULT f_type_y #elif RSQRT /* Do sloppy iteration (about 2 lsb error). y = (y * flah_scale) * (three - (y*scaled_x) * y) */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ a = y * scaled_x; \ b = a * y; \ c = y * flah_scale; \ b = three - b; \ y = c * b # define RESULT y #elif FAST_SQRT /* Do sloppy iteration (about 2 lsb error). y = ((y*scaled_x) * half_scale) * (three - (y*scaled_x) * y) */ # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ a = y * scaled_x; \ b = a * y; \ c = a * half_scale; \ b = three - b; \ y = c * b # define RESULT y #elif SQRT /* Do more accurate iteration and check last bit. [ NB: we compute ulp = 2*ULP_FACTOR*c, because y ~= 2*c.] */ # define DECLARE_old_mode U_WORD old_mode; # define DECLARE_ulp_stuff F_TYPE ulp, y_less_1_ulp, y_plus_1_ulp; # define ITERATE_AND_MAYBE_CHECK_LAST_BIT \ a = y * scaled_x; \ ulp = 2.0*ULP_FACTOR; \ b = a * y; \ c = a * half_scale; \ b = one - b; \ a = c + c; \ b = c * b; \ ulp *= c; \ y = a + b; \ y_less_1_ulp = y - ulp; \ ASSERT( y_less_1_ulp < y ); \ y_plus_1_ulp = y + ulp; \ ASSERT( y_plus_1_ulp > y ); \ ESTABLISH_ROUND_TO_ZERO(old_mode); \ F_MUL_CHOPPED(y, y_less_1_ulp, a); \ F_MUL_CHOPPED(y, y_plus_1_ulp, b); \ RESTORE_ROUNDING_MODE(old_mode); \ y = ((a >= x) ? y_less_1_ulp : y); \ y = ((b < x) ? y_plus_1_ulp : y); # define RESULT y #else error "Can't define ITERATE_AND_MAYBE_CHECK_LAST_BIT" #endif #ifndef DECLARE_old_mode #define DECLARE_old_mode #endif #ifndef DECLARE_ulp_stuff #define DECLARE_ulp_stuff #endif /*----------------------------------------------------------------------------*/ /* The Function Itself! */ /*----------------------------------------------------------------------------*/ F_TYPE F_ENTRY_NAME(F_TYPE x) { EXCEPTION_RECORD_DECLARATION B_UNION u, v; F_TYPE f_type_y; B_TYPE y, a, b, c; B_TYPE scaled_x; B_TYPE IF_SQRT (half_scale) IF_RSQRT(flah_scale); const B_TYPE half = (B_TYPE)0.5; const B_TYPE one = (B_TYPE)1.0; const B_TYPE three = (B_TYPE)3.0; DECLARE_old_mode DECLARE_ulp_stuff LS_INT_TYPE exp, save_exp; U_LS_INT_TYPE index; U_LS_INT_TYPE lo_exp_bit_and_hi_frac; U_LS_INT_TYPE hi_exp_mask = HI_EXP_BIT_MASK; U_LS_INT_TYPE exp_of_one_half = EXP_BITS_OF_ONE_HALF; #if defined(HAS_SQRT_INSTRUCTION) && ( FAST_SQRT || SQRT ) && ( SINGLE_PRECISION || DOUBLE_PRECISION ) u.f = (B_TYPE)x; save_exp = u.B_HI_LS_INT_TYPE; if INPUT_IS_ABNORMAL goto abnormal_input; F_HW_SQRT(x,RESULT); return RESULT; #else SCALE_AND_DO_INDEXED_POLY_APPROX; if INPUT_IS_ABNORMAL goto abnormal_input; NEWTONS_ITERATION; NEWTONS_ITERATION; ITERATE_AND_MAYBE_CHECK_LAST_BIT; return RESULT; #endif abnormal_input: #if VAX_FLOATING /* x is either 0 or negative */ if (x == (F_TYPE)0.0) { #if RSQRT GET_EXCEPTION_RESULT_1(RSQRT_OF_POS_ZERO, x, RESULT); #else RESULT = x; #endif } else { GET_EXCEPTION_RESULT_1(SQRT_OF_NEGATIVE, x, RESULT); } return RESULT; #elif (IEEE_FLOATING) F_CLASSIFY(x, index); switch (index) { case F_C_SIG_NAN: case F_C_QUIET_NAN: RESULT = x; return RESULT; break; #if RSQRT case F_C_POS_INF: RESULT = (F_TYPE)0.0; return RESULT; break; case F_C_POS_ZERO: GET_EXCEPTION_RESULT_1(RSQRT_OF_POS_ZERO, x, RESULT); return RESULT; break; case F_C_NEG_ZERO: GET_EXCEPTION_RESULT_1(RSQRT_OF_NEG_ZERO, x, RESULT); return RESULT; break; #else case F_C_POS_INF: case F_C_POS_ZERO: case F_C_NEG_ZERO: RESULT = x; return RESULT; break; #endif case F_C_NEG_INF: case F_C_NEG_NORM: case F_C_NEG_DENORM: GET_EXCEPTION_RESULT_1(SQRT_OF_NEGATIVE, x, RESULT); return RESULT; break; default: /* must be positive denorm */ F_MAKE_FLOAT( ((WORD) (2*F_PRECISION + 1) << F_EXP_POS), f_type_y); F_COPY_SIGN_AND_EXP(x, f_type_y, x); x -= f_type_y; #if defined(HAS_SQRT_INSTRUCTION) && ( FAST_SQRT || SQRT ) && ( SINGLE_PRECISION || DOUBLE_PRECISION ) F_HW_SQRT(x,RESULT); #else SCALE_AND_DO_INDEXED_POLY_APPROX; NEWTONS_ITERATION; NEWTONS_ITERATION; ITERATE_AND_MAYBE_CHECK_LAST_BIT; #endif /* Scale down again (up for RSQRT) */ IF_SQRT ( SUB_FROM_EXP_FIELD(RESULT, F_PRECISION) ); IF_RSQRT( ADD_TO_EXP_FIELD(RESULT, F_PRECISION) ); return RESULT; break; } #endif } /* sqrt */ /*----------------------------------------------------------------------------*/ /* MPHOC code to generate the table */ /*----------------------------------------------------------------------------*/ #if MAKE_INCLUDE #undef F_NAME_SUFFIX #define F_NAME_SUFFIX TABLE_SUFFIX @divert divertText /* ** Print header information. */ print; print "#include \"dpml_private.h\""; print; print "#define NUM_FRAC_BITS ", STR(NUM_FRAC_BITS); print; /* ** The definitions of SQRT_COEF_STRUCT and D_SQRT_TABLE_NAME also ** appear in the code. */ print "typedef struct {"; print " float a, b;"; print " double c;"; print "} SQRT_COEF_STRUCT;"; print; print "const SQRT_COEF_STRUCT D_SQRT_TABLE_NAME[(1<<(NUM_FRAC_BITS+1))] = {"; print; /* ** Generate and print the polynomial coefficients. */ function rsqrt_f(r) { return 1/sqrt(r); } precision = ceil( (D_PRECISION + 16)/MP_RADIX_BITS ); /* ** For each half fo the table, ... */ for (h = 1; h <= 2; h++) { xaa = 0.5; xbb = 1.0; xkk = 1.0/h; print; printf("/*\n**\t"); printf("a*x^2 + b*x + c"); printf(" ~= sqrt(%5r/x),\t\t%5r <= x < %5r", xkk, xaa, xbb); printf("\n*/\n"); for (i = 0; i < 2^NUM_FRAC_BITS; i++) { xa = xaa + (xbb-xaa) * i /2^NUM_FRAC_BITS; xb = xaa + (xbb-xaa) * (i+1)/2^NUM_FRAC_BITS; /* ** Determine a minimum-error quadratic approximation to ** sqrt(xkk/x) in the range xa <= x <= xb. (This doesn't ** minimize the error after a Newton's iteration; that'd ** require a weighting function of x^(1/4), a needless ** complication for this single-precision approximation). */ tol = S_PRECISION+2; flags = 0; err = remes(flags, xa, xb, rsqrt_f, tol, °ree, &rsqrt_c); if (degree != 2) print("*** degree = %i\n", degree); for (j = 0; j <= degree; j++) rsqrt_c[j] = rsqrt_c[j] * sqrt(xkk); /* ** Now round the x^2 and x coefficients to single precision, ** by subtracting Chebyshev polynomials. The additional error ** is negligible (less than 3%; e.g., if the polynomial was good to ** 27 bits, it's degraded to only 27-log2(1.03) = 26.96 bits). ** ** The algebra is simplified by expressing the range xa..xb in terms of ** the range's midpoint and radius. */ xm = (xb + xa)/2; xr = (xb - xa)/2; z = xm / xr; /* ** The Chebyshev polynomials we subtract are multiples of: ** ** w <-> (x-xm)/xr ** 1-2*w^2 <-> 1-2*((x-xm)/xr)^2 ** ** The x terms are collected, scaled (by t), and subtracted from the ** polynomial coefficients. ** ** First we subtract (a multiple of) the 2nd degree Chebyshev polynomial ** to produce a new polynomial with the desired (representable in single ** precision) 2nd degree polynomial coefficient. This minimizes the ** maximum absolute error between the 'Remes' polynomial and the new ** polynomial (since the difference is a Chebyshev polynomial, which ** has the 'equal ripple' property). ** ** Then we subtract (a multiple of) the 1st degree Chebyshev polynomial ** to produce a new polynomial with the desired (representable in single ** precision) 1st degree coefficient. This minimizes the maximum ** absolute error between the previous polynomial and the newer one ** (under the constraints of having the same 2nd degree coefficient, ** and the desired 1st degree coefficient). The 0th degree coefficient ** is rounded to double precision (somebody's got to!), and this has ** no significant effect on the single precision result. ** ** Is the resulting polynomial optimal? Nope; nobody claims it is. ** Is it 'best' in some sense? Yes -- the theory is clear and the code ** is short (disregarding this phillipic). Is it close enough? Yep. ** Why? That's a good question.... ** ** To see why this works, consider the polynomial for 1/sqrt(x) for ** 1 <= x < 1+2^-7, ** ** 0.37... x^2 + -1.24... x + 1.87... ** ** Simply rounding the x coefficient to 24 bits may corrupt the result ** of the polynomial by as much as (1+2^-7) * 0.5*s_lsb(1.24), where ** s_lsb(z) = 2^floor(log2(|z|) + 1 - 24) is the value of z's least ** significant bit when z is expressed in single precision. This is ** as much as 2^-24, which is 2*s_lsb(1/sqrt(x)) -- two single-precision ** lsb of the result! Rounding the x^2 coefficient has similar effects, ** affecting the result by 1/2 single-precision lsb. We can do better. ** ** If rounding increases the x coefficient by t, |t| <= 0.5*lsb(1.24), ** the corruption can be partly compensated by adjusting the constant ** coefficient, decreasing it by (for example) t*(1 + 1+2^-7)/2. ** The corruption is then: ** ** t*( x - (1+1+2^-7)/2 ) ** ** Since 1 <= x < 1+2^-7, and |t| <= 0.5*lsb(1.24), we have: ** ** | t*( x - (1+1+2^-7)/2 ) | <= 0.5*lsb(1.24) * 2^-8 = 2^(-24 -8) ** ** which is only 0.0078125*s_lsb(1/sqrt(x)) -- a factor of 256 smaller ** than the corruption from simply rounding the x coefficient. ** ** To minimize the (absolute value of the) maximum corruption, we add ** a multiple of a Chebyshev polynomial, for the particular range of x, ** because Chebyshev polynomials are 'minimax' (or 'equal ripple') ** polynomials. ** For the range -1 <= w <= 1, the Chebyshev polynomials are: ** ** 1, w, 2*w^2-1, 4*w^3-3*w, .... ** ** To convert these to polynomials in x for the range a <= x <= b, ** substitute (x-m)/r, with m = (b+a)/2, r = (b-a)/2, and z = m/r. ** The Chebyshev polynomials become: ** ** 1, x/r - z, 2*(x/r)^2 - 4*z*(x/r) + 2*z^2-1, ** 4*(x/r)^3 - 12*z*(x/r)^2 + (12*z^2-3)*(x/r) - 4*z^3+3*z, .... ** ** For 1 <= x < 1+2^-7, these are: ** ** 1, 2^8*x - (2^8+1), 2^17*x^2 - (2^18+2^10)*x + (2^17+2^10+1), ** 2^26*x^3 - 3*(2^26+2^18)*x^2 + 3*(2^26+2^19+3*2^8)*x ** - (2^26+3*2^18+2^11+2^8+1), .... ** ** Each of these are 'equal ripple', oscillating between +/-1. We see ** our previous adjustment, ( x - (1+1+2^-7)/2 ), appear here with a ** factor of 2^8. Scaling it by t*2^-8 gives our previous result; this ** scaling also reduces the 'ripple' to +/-t*2^-8. ** ** When we use the 2nd degree Chebyshev polynomial to round the 2nd ** degree coefficient to single precision, we must scale the polynomial ** by a factor of t*2^-17, where here |t| <= 0.5*lsb(0.37). This means ** that the effect of this corruption, the size of the 'ripple', is less ** than 0.5*lsb(0.37)*2^-17 = 2^-43, or 2^-18*s_lsb(1/sqrt(x)). This is ** far better than the the 1/2 lsb we got when we simply rounded the x^2 ** coefficient. ** ** Can this technique be applied to other polynomial coefficients? ** It is an invention of my own conception developed outside the term ** of my contract, and for which I've received no compensation. */ t = rsqrt_c[2] - bround(rsqrt_c[2], S_PRECISION); rsqrt_c[2] = rsqrt_c[2] - t; rsqrt_c[1] = rsqrt_c[1] + t * 2*z * xr; rsqrt_c[0] = rsqrt_c[0] + t * (0.5-z^2) * xr^2; t = rsqrt_c[1] - bround(rsqrt_c[1], S_PRECISION); rsqrt_c[2] = rsqrt_c[2]; rsqrt_c[1] = rsqrt_c[1] - t; rsqrt_c[0] = rsqrt_c[0] + t * z * xr; t = rsqrt_c[0] - bround(rsqrt_c[1], D_PRECISION); printf("{\t%.10r,\t%.10r,\t%.20r\t},\n", rsqrt_c[2], rsqrt_c[1], rsqrt_c[0]); } } /* ** Print the trailer. */ print; print "};"; print; @end_divert @eval my $outText = MphocEval( GetStream( "divertText" ) ); \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Double precision square root table", __FILE__); \ print "$headerText\n\n$outText"; #endif /* MAKE_INCLUDE */ /*----------------------------------------------------------------------------*/ /* Testing */ /*----------------------------------------------------------------------------*/ #if MAKE_MTC @divert > dpml_sqrt.mtc build default = "sqrt.a"; function SINGLE_SQRT = F_CHAR F_SQRT_NAME(F_CHAR.v.r); function FAST_SINGLE_SQRT = F_CHAR F_FAST_SQRT_NAME(F_CHAR.v.r); function DOUBLE_SQRT = B_CHAR B_SQRT_NAME(B_CHAR.v.r); function FAST_DOUBLE_SQRT = B_CHAR B_FAST_SQRT_NAME(B_CHAR.v.r); function MP_SQRT = void mp_sqrt(m.r.r, m.r.w); type SQRT_ACCURACY = accuracy error = lsb; stats = max; points = 1024; ; domain SINGLE_SQRT_DENORMS = { [ 0.0 , 1e-37 ]:uniform:10001 } ; domain DOUBLE_SQRT_DENORMS = { [ 0.0 , 1e-307 ]:uniform:10001 } ; domain SINGLE_SQRT_ACCURACY = { [ 0.0 , 17.0 ]:uniform:100001 } ; domain DOUBLE_SQRT_ACCURACY = { [ 0.0 , 17.0 ]:uniform:100001 } ; domain SQRT_KEYPOINTS = lsb = 0.5; { 2.0 | der } { 5.0 | der } { 10.0 | der } lsb = 0.5; { MTC_POS_TINY | der } { MTC_POS_HUGE | der } { 0.0 | 0.0 } { 1.0 | 1.0 } { MTC_NEG_ZERO | MTC_NEG_ZERO } { MTC_POS_INFINITY | MTC_POS_INFINITY } { MTC_NAN | MTC_NAN } ; domain FAST_SINGLE_SQRT_KEYPOINTS = lsb = 1.0; { 2.0 | der } { 5.0 | der } { 10.0 | der } lsb = 1.0; { MTC_POS_TINY | der } { MTC_POS_HUGE | der } { 0.0 | 0.0 } { 1.0 | 1.0 } { MTC_NEG_ZERO | MTC_NEG_ZERO } { MTC_POS_INFINITY | MTC_POS_INFINITY } { MTC_NAN | MTC_NAN } ; domain FAST_DOUBLE_SQRT_KEYPOINTS = lsb = 2.0; { 2.0 | der } { 5.0 | der } { 10.0 | der } lsb = 2.0; { MTC_POS_TINY | der } { MTC_POS_HUGE | der } { 0.0 | 0.0 } { 1.0 | 1.0 } { MTC_NEG_ZERO | MTC_NEG_ZERO } { MTC_POS_INFINITY | MTC_POS_INFINITY } { MTC_NAN | MTC_NAN } ; test sqrt_acc_sd = type = SQRT_ACCURACY; domain = SINGLE_SQRT_ACCURACY; function = SINGLE_SQRT; comparison_function = FAST_DOUBLE_SQRT; output = file = "sqrt_acc_sd.out"; ; ; test sqrt_denorm_acc_sd = type = SQRT_ACCURACY; domain = SINGLE_SQRT_DENORMS; function = SINGLE_SQRT; comparison_function = FAST_DOUBLE_SQRT; output = file = "sqrt_denorm_acc_sd.out"; ; ; test fast_sqrt_acc_sd = type = SQRT_ACCURACY; domain = SINGLE_SQRT_ACCURACY; function = FAST_SINGLE_SQRT; comparison_function = FAST_DOUBLE_SQRT; output = file = "fast_sqrt_acc_sd.out"; ; ; test sqrt_acc_dm = type = SQRT_ACCURACY; domain = DOUBLE_SQRT_ACCURACY; function = DOUBLE_SQRT; comparison_function = MP_SQRT; output = file = "sqrt_acc_dm.out"; ; ; test sqrt_denorm_acc_dm = type = SQRT_ACCURACY; domain = DOUBLE_SQRT_DENORMS; function = DOUBLE_SQRT; comparison_function = MP_SQRT; output = file = "sqrt_denorm_acc_dm.out"; ; ; test fast_sqrt_acc_dm = type = SQRT_ACCURACY; domain = DOUBLE_SQRT_ACCURACY; function = FAST_DOUBLE_SQRT; comparison_function = MP_SQRT; output = file = "fast_sqrt_acc_dm.out"; ; ; test sqrt_key_sd = type = key_point; domain = SQRT_KEYPOINTS; function = SINGLE_SQRT; comparison_function = DOUBLE_SQRT; output = file = "sqrt_key_sd.out" ; style = verbose; ; ; test sqrt_key_dm = type = key_point; domain = SQRT_KEYPOINTS; function = DOUBLE_SQRT; comparison_function = MP_SQRT; output = file = "sqrt_key_dm.out" ; style = verbose; ; ; test fast_sqrt_key_sd = type = key_point; domain = FAST_SINGLE_SQRT_KEYPOINTS; function = FAST_SINGLE_SQRT; comparison_function = FAST_DOUBLE_SQRT; output = file = "fast_sqrt_key_sd.out" ; style = verbose; ; ; test fast_sqrt_key_dm = type = key_point; domain = FAST_DOUBLE_SQRT_KEYPOINTS; function = FAST_DOUBLE_SQRT; comparison_function = MP_SQRT; output = file = "fast_sqrt_key_dm.out" ; style = verbose; ; ; @end_divert #endif /* MAKE_MTC */ LIBRARY/float128/dpml_ux_trig.c0000644€­ Q00042560000012243214616534611016277 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #define BASE_NAME trig #include "dpml_ux.h" #if !defined(MAKE_INCLUDE) # include STR(BUILD_FILE_NAME) #endif /* ** OVERVIEW ** -------- ** ** The implementation of the trig functions is based on four support routines: ** two common evaluation routine (one for sin/cos/sind/cosd and one for ** tan/cot/tand/cotd) together with two argument reduction routines, one for ** radian arguments and one for degree arguments. ** ** There are various reduction schemes that can be used for trigonometric ** functions. The polynomial evaluation routines require that the terms in ** the series decrease in magnitude. For the trig functions, this implies ** that an argument reduction scheme should return a reduce argument with ** magnitude less than or equal to pi/4 is an appropriate choice. In ** particular, we assume that for a given value, x, the argument reduction ** scheme (for both radian and degrees) produces two integers, I1 and I and an ** unpacked floating point result, z, such that ** ** x = (2*pi)*I1 + I*(pi/2) + z, |z| <= pi/4 ** ** NOTE: having the degree reduction return the reduced ** argument in radian permits the use of only one set ** of polynomial coefficient and simplifies the evaluation ** logic. ** ** The value of I we will refer to as the quadrant bits and z as the reduced ** argument. We assume also that argument reduction routines returns both I ** and z to its caller. (I1 is never needed in the subsequent computations, ** so it is not returned.) ** ** The following table gives an estimate of the number of terms in a polynomial ** and rational approximation for each of the basic trig functions. For ** rational approximations the degree of the numerator and denominator are ** presented as an ordered pair. The approximation is assumed to be good to ** 128 bits for |x| <= pi/4. The values in this table were extrapolated from ** the tables given in Hart et. al. ** ** approximation form ** ------------------------ ** function polynomial rational ** -------- ---------- -------- ** sin 12 (6, 6) ** cos 12 (6, 6) ** tan 29 (7, 7) ** ** So from the above table, it seems most efficient to evaluate sin and cos via ** polynomials and evaluate tangent via a rational approximation. So we assume ** that for |x| <= pi/4, we have polynomials, S, C, P and Q such that ** ** sin(x) ~ x*S(x^2) ** cos(x) ~ C(x^2) ** tan(x) ~ x*P(x^2) / Q(x^2) ** cot(x) ~ Q(x^2) / *[x*P(x^2)] ** ** Now, for any argument, x, given its reduced argument, z, and its quadrant ** bits, I, we can evaluate sin, cos, tan and cot of x according to Table 1. ** ( For brevity we denote z*P(z^2) by p, Q(z^2) by q, etc): ** ** Quadrant bits, I ** ---------------------------- ** function 0 1 2 3 ** -------- ----- ----- ----- ----- ** sin s c -s -c ** cos c -s -c s ** tan p/q -q/p p/q -q/p ** cot q/p -p/q q/p -p/q ** ** Table 1 ** ------- ** ** ** REDUCTION INTERFACE: ** -------------------- ** ** As mentioned earlier, the overall design of the the trig routines is ** dependent on two routines to do argument reduction. The prototype for ** these functions is; ** ** WORD ** __reduce( ** _UX_FLOAT * unpacked_argument, ** INT_64 octant, ** _UX_FLOAT * reduced_argument ** ) ** ** Assuming that 'unpacked_argument' points to a _UX_FLOAT data item with value ** x, then the semantics of the reduction routines are to compute integers I1 ** and I, and a floating point value, z, such that ** ** x + octant*(CYCLE/4) = (2*CYCLE)*I1 + (CYCLE/2) + z, |z| < CYCLE/4 ** ** Note that performing the reduction on x + octant*(CYCLE/4), rather than x, ** not only allows us to deal with the _vo entry points easily, it also ** permits easy use of the identities cos(x) = sin(x + CYCLE/2) and cot(x) = ** tan(CYCLE/2) to consolidate the overall processing. ** ** ** ** EVALUATION INTERFACE: ** --------------------- ** ** The prototypes for each of the two evaluation routines is; ** ** void ** __trig_evaluate( ** UX_FLOAT * unpacked_argument, ** WORD octant, ** U_WORD function_code, ** UX_FLOAT * unpacked_result ** ); ** ** The evaluation routines need not know whether the evaluation is for degrees ** because the appropriate reduction is done based on the value of ** function_code. */ #if !defined(UX_RADIAN_REDUCE) # define UX_RADIAN_REDUCE __INTERNAL_NAME(ux_radian_reduce__) #endif /* ** The radian reduction code is rather large and has a rather detailed ** explanation. Consequently, its contained in a separate file and is ** included here. */ #if !defined(MAKE_INCLUDE) # include "dpml_ux_radian_reduce.c" #endif /* ** UX_DEGREE_REDUCE performs argument reduction for degree arguments. The ** reduction is performed in three phases: ** ** (1) if |x| >= 2^141, reduce modulo 360 to a value less than 2^141 ** by operating on the exponent field of x ** (2) if |x| > 2^15, reduce modulo 360 to a value less that 2^15 ** by operating on the integer portion of x ** (3) if |x| < 2^15, compute I = nint(x/90) and the reduced argument ** as x - I*90 ** ** The details of each of these phases is discussed in more detail in the ** code. */ #if !defined(UX_DEGREE_REDUCE) # define UX_DEGREE_REDUCE __INTERNAL_NAME(ux_degree_reduce__) #endif static U_WORD UX_DEGREE_REDUCE( UX_FLOAT * argument, WORD octant, UX_FLOAT * reduced_argument) { WORD cnt, digit_with_binary_pt, digit_num, w_tmp, quadrant; UX_SIGN_TYPE sign; UX_EXPONENT_TYPE exponent, k; UX_FRACTION_DIGIT_TYPE current_digit, tmp_digit, sum_digit, borrow; sign = G_UX_SIGN(argument); exponent = G_UX_EXPONENT(argument); if (exponent > (UX_PRECISION + 14)) { /* ** This is a very large argument. We make use of the identity ** ** 8*(2^12)^(n+1) = 8*(136)^(n+1) (mod 360) ** = [8*(136)]*(136)^n ** = (1088)*(136)^n ** = 8*(136)^n (mod 360) ** ** Or employing induction, 8*(2^12)^n = 8 (mod 360) ** ** If p is the precision of the data type, we begin by writing the ** input argument x as: ** ** x = 2^n*f ** = 2^(n-p)*(2^p*f) ** = 2^(n-p)*F ** ** where F = 2^p*f is an integer. Now let k = floor((n - p - 3)/12) ** and r = n - p - 3 - 12*k. Then ** ** x = 2^(n-p)*F ** = 2^(12k + r + 3)*F ** = 8*2^(12k)]*(2^r*F) ** = [8*(2^12)^k]*(2^r*F) ** = 8*(2^r*F) (mod 360) ** = 2^(3 + r + p)*f ** = 2^(n - 12*k)*f ** ** So the approach is to find k and subtract 12*k from the exponent ** field. This will reduce the input argument to a number less than ** 2^(p + 14) ** ** One last note. We don't actually do an integer divide to get ** k. Rather we multiply n by an integer that is effectively the ** reciprocal of 12. This is easier to do if the exponent field ** is positive so we want to add a bias to the exponent that is ** divisible by 12 and that will force the exponent to be positive. ** We assume at this point that |exponent| < (1 << F_EXP_WIDTH). ** ** Let the bias = 12*B, then ** ** k = floor((n - p - 3)/12) ** = floor((n - p - 3 + 12*B - 12*B)/12) ** = floor((n - p - 3 + 12*B)/12 - B) ** = floor((n - p - 3 + 12*B)/12) - B ** = floor((n + (12*B - p - 3))/12) - B ** ** ==> n - 12*k = n - 12*[floor((n + (12*B - p - 3))/12) - B] ** = n - 12*floor((n + (12*B - p - 3))/12) - 12*B */ # define BIAS (12*(((1 << F_EXP_WIDTH) + 11)/12)) exponent += (BIAS - UX_PRECISION - 3); UMULH((UX_FRACTION_DIGIT_TYPE) exponent, RECIP_TWELVE, k); exponent = (exponent + (UX_PRECISION + 3)) - 12*k; P_UX_EXPONENT(argument, exponent); } if (exponent >= 16) { /* ** For a medium arguments, 2^15 < |x| < 2^142, we consider the fraction ** field of x as a sequence of digit. The digits that are comprised ** entirely of "integer" bits are reduced modulo 360 using the ** identity 8*2^12 = 8 (mod 360). ** ** Begin by writing |x| = 2^n*f, with f in the interval [1/2, 1) and ** define s = (n - 15) % k, where k is the number of bits per fraction ** digit. If there are 4 digits per UX_FLOAT, then the following ** diagram indicates the relationship between n, s and the binary point ** of x: ** ** |<---------- n - 15 -------->| 15 |<-- ** +-----------+-----------+-----------+-----------+ ** f : | F1 | F2 | F3 | F4 | ** +-----------+-----------+-----------+-----------+ ** -->| s |<-- ^ ** binary pt ** ** Suppose we now shift the bits of f, s bits to the left to get f'. ** Then the diagram would look like ** ** -->| 15 |<-- ** +-----------+-----------+-----------+-----------+-----------+ ** f': | F0' | F1' | F2' | F3' | F4' | ** +-----------+-----------+-----------+-----------+-----------+ ** ^ ** binary pt ** ** and if we denote the number of digits per UX_FLOAT by N, then ** ** x = 2^(n-s)*(F0' + F1'/K + F2'/K^2 + ... + F4'/K^N) ** ** Now n - 15 - s is multiple of k, i.e. n - s = j*k + 15, so that ** 2^(n-s) = 2^(j*k+15) = 2^15*K^j and ** ** x = 2^(n-s)*(F0' + F1'/K + F2'/K^2 + ... + FN'/K^N) ** = 2^15*(K^j)*(F0' + F1'/K + F2'/K^2 + .... + FN'/K^N) ** = 2^15*[F0'*K^j + F1'*K^(j-1) + ... + FN'/K^(j-N)] ** = 2^15*A + 2^15*b ** ** A = F0'*K^j + F1'*K^(j-1) + ... + Fj ** b = Fj+1'/K + ... + FN'/K^(N-j) ** ** If we denote B = trunc(2^12*b) as B and b' = 2^15*b - 2^3*B, then ** ** x = 2^15*A + 2^15*b ** = 2^15*A + 2^3*B + b' ** = 2^15*A + 2^3*B + b' ** = 8*(2^12*A + B) + b' ** = 8*C + b' ** ** Now let C_lo be the low 12 bits of C and C_hi be the remaining ** bits, then ** ** 8*C = 8*(C_lo + 2^12*C_hi) ** = 8*(C_lo + 136*C_hi) (mod 360) ** = 8*C_lo + 8*136*C_hi) ** = 8*C_lo + 8*C_hi) (mod 360) ** = 8*(C_lo + C_hi) ** ** Thus we effectively reduced the value of 8*C by (almost) 12 bits ** modulo 360. Obviously, we can iterate on this process until until ** we produce a value C' which is less that 2^12 and 8*C' = 8*C modulo ** 360. In order to increase performance (and simplify the ** implementation) the actual code below doesn't do the reduction 12 ** bits at a time initially. Rather it first does the reduction 24 or ** 60 bits bits at a time (depending on the digit size), and then does ** 12 bit reduction on that result. ** ** NOTE: In order to avoid copying the input argument to ** a work buffer and to simplify the logic, the follow code ** overlays the sign and exponent field of a UX_FLOAT type ** with an "extra" digit. */ # if BITS_PER_UX_FRACTION_DIGIT_TYPE > (BITS_PER_UX_EXPONENT_TYPE + \ BITS_PER_UX_SIGN_TYPE) # error "Need work buffer for this UX_FLOAT struct" # endif digit_with_binary_pt = exponent - 15; cnt = digit_with_binary_pt & (BITS_PER_UX_FRACTION_DIGIT_TYPE - 1); digit_with_binary_pt >>= __LOG2(BITS_PER_UX_FRACTION_DIGIT_TYPE); tmp_digit = 0; exponent -= cnt; if (cnt) { /* shift digit right (in memory) */ w_tmp = BITS_PER_UX_FRACTION_DIGIT_TYPE - cnt; current_digit = G_UX_LSD(argument); P_UX_LSD(argument, current_digit << cnt); # if NUM_UX_FRACTION_DIGITS == 4 tmp_digit = G_UX_FRACTION_DIGIT(argument, 2); P_UX_FRACTION_DIGIT(argument, 2, (tmp_digit << cnt) | ( current_digit >> w_tmp)); current_digit = G_UX_FRACTION_DIGIT(argument, 1); P_UX_FRACTION_DIGIT(argument, 1, (current_digit << cnt) | ( tmpt_digit >> w_tmp)); # endif tmp_digit = G_UX_MSD(argument); P_UX_MSD(argument, (tmp_digit << cnt) | ( current_digit >> w_tmp)); tmp_digit >>= w_tmp; } /* P_UX_FRACTION_DIGIT(argument, -1, tmp_digit); */ /* ** Because of the compiler warning we are replacing the above ** line in the source. */ *(&(((UX_FLOAT*)(argument))->fraction[0])-1) = tmp_digit; /* ** Extract B from the digit that contains the binary point */ sum_digit = G_UX_FRACTION_DIGIT(argument, digit_with_binary_pt) >> (BITS_PER_UX_FRACTION_DIGIT_TYPE - 12); /* ** Loop through the remaining integer digits and add them to B */ # define MOD_360_BITS_PER_DIGIT (12*(BITS_PER_UX_FRACTION_DIGIT_TYPE/12)) # define MOD_360_DIGIT_MASK MAKE_MASK(MOD_360_BITS_PER_DIGIT, 0) digit_num = digit_with_binary_pt; cnt = 0; while (digit_num >= 0) { current_digit = G_UX_FRACTION_DIGIT(argument, --digit_num); P_UX_FRACTION_DIGIT(argument, digit_num, 0); if (cnt) { sum_digit += ((current_digit << cnt) & 0xfff); w_tmp = 12 - cnt; current_digit >>= w_tmp; cnt = -w_tmp; } sum_digit = (sum_digit + (current_digit & MOD_360_DIGIT_MASK)) + (current_digit >> MOD_360_BITS_PER_DIGIT); cnt += (BITS_PER_UX_FRACTION_DIGIT_TYPE - MOD_360_BITS_PER_DIGIT); } /* ** For 64 bit digits, at this point sum_digit can have five 12 bit ** "digits" plus a carry "digit" for a total of six. So it is ** more efficient to compress sum_digit 24 bits at a time rather than ** 12 bits at a time. */ # if (BITS_PER_UX_FRACTION_DIGIT_TYPE == 64) sum_digit = (sum_digit & 0xffffff) + ((sum_digit >> 24) & 0xffffff) + ((sum_digit >> 48) & 0xffffff); # endif /* ** At this point sum_digit may contain two 12 bit "digits" plus a ** carry "digit". So we recurse (at most twice) to reduce it to 12 ** bits modulo 360. */ while ((tmp_digit = (sum_digit >> 12))) sum_digit = (sum_digit & 0xfff) + tmp_digit; /* ** Now put the reduced integer into the original fraction field, ** normalize the result, and calculate the exponent value. */ current_digit = G_UX_FRACTION_DIGIT(argument, digit_with_binary_pt); current_digit &= MAKE_MASK(BITS_PER_UX_FRACTION_DIGIT_TYPE - 12, 0); current_digit |= (sum_digit << (BITS_PER_UX_FRACTION_DIGIT_TYPE - 12)); P_UX_FRACTION_DIGIT(argument, digit_with_binary_pt, current_digit); P_UX_EXPONENT(argument, exponent); exponent -= NORMALIZE(argument); } /* ** At this point |x| < 2^15 so that if I = nint(x/90), I < 2^9 and ** I*90 requires at most 15 significant bits. This means that we ** can reduce x by working only with its most significant digit. ** ** Let F be the high k bits of the fraction of x, where k is the number ** of bits per fraction digit and K = 2^k. Further, let R an k-1 bit ** integer such that 1/90 ~ R/(32*K). (I.e. R is the high bits of 1/90 ** unnormalized by one bit.) We can now write x = 2^n*(F + e)/K and ** 1/90 = (R + d)/(32*K), where |e| < 1 and |d| < 1/2. Consequently ** we have: ** ** x/90 = (2^n*f)*(1/90) ** = 2^n*[(F + e)/K]*[(R + d)/(32*K)] ** = 2^(n-5)*(F*R + e*R + d*F + e*d)/K^2 ** = 2^(n-5)*(K*hi(F*R) + lo(F*R) + e*R + d*F + e*d)/K^2 ** ** Now K*hi(F*R) > K^2/8 and | lo(F*R) + e*R + d*F + e*d | < 2K and ** so the relative error in neglecting lo(F*R) + e*R + d*F + e*d is less ** that one part in 2^(k-4). Since k is at least 32, the relative error ** is very small. We have then ** ** x/90 = 2^(n-5)*[K*hi(F*R) + lo(F*R) + e*R + d*F + e*d]/K^2 ** ~ 2^(n-5)*hi(F*R)/K */ w_tmp = exponent - 5; P_UX_SIGN(argument, 0); current_digit = G_UX_MSD(argument); if (w_tmp > 0) { UMULH( current_digit, MSD_OF_RECIP_90, tmp_digit); } else { /* I = 0 */ w_tmp = 1; tmp_digit = 0; } /* I ~ x/90, "add in octant" and round to nearest integer */ cnt = BITS_PER_UX_FRACTION_DIGIT_TYPE - w_tmp; tmp_digit = (tmp_digit + ((octant & 1) << (cnt - 1)) + SET_BIT(cnt - 1)) & ~MAKE_MASK(cnt, 0); /* Get quadrant bits and adjust for sign of the argument */ quadrant = (tmp_digit >> cnt); quadrant = (sign) ? -quadrant : quadrant; quadrant += (octant >> 1); /* now subtract I*90 from x */ # define MSD_OF_NINETY (((UX_FRACTION_DIGIT_TYPE) 45) << \ (BITS_PER_UX_FRACTION_DIGIT_TYPE - 6)) UMULH(tmp_digit, MSD_OF_NINETY, tmp_digit); tmp_digit = (current_digit >> 2) - tmp_digit; current_digit = (current_digit & 3) | (4*tmp_digit); if (((UX_SIGNED_FRACTION_DIGIT_TYPE) tmp_digit) < 0) { sign ^= UX_SIGN_BIT; sum_digit = G_UX_LSD(argument); tmp_digit = -sum_digit; borrow = (sum_digit != 0); P_UX_LSD(argument, tmp_digit); # if ( NUM_UX_FRACTION_DIGITS == 4) sum_digit = G_UX_FRACTION_DIGIT(argument, 2); tmp_digit = - (sum_digit + borrow); borrow = (sum_digit != 0) | borrow; P_UX_FRACTION_DIGIT(argument, 2, tmp_digit); sum_digit = G_UX_FRACTION_DIGIT(argument, 1); tmp_digit = - (sum_digit + borrow); borrow = (sum_digit != 0) | borrow; P_UX_FRACTION_DIGIT(argument, 1, tmp_digit); # endif current_digit = - (current_digit + borrow); } P_UX_MSD(argument, current_digit); NORMALIZE(argument); /* Last by not least, convert to radians */ MULTIPLY(argument, UX_PI_OVER_180, reduced_argument); UX_TOGGLE_SIGN(reduced_argument, sign); return quadrant; } /* ** UX_SINCOS is the common evaluation routine for all of the sin/cos and ** sind/cosd entry points. UX_SINCOS invokes the appropriate reduction ** routine (radian or degrees) and then performs 1 or 2 polynomial evaluation ** on the reduced argument to get the result (or results, for sincos and ** sincosd) */ #define ODD_POLY_FLAGS SQUARE_TERM | ALTERNATE_SIGN | POST_MULTIPLY #define EVEN_POLY_FLAGS SQUARE_TERM | ALTERNATE_SIGN #define SIN_POLY_FLAGS NUMERATOR_FLAGS( ODD_POLY_FLAGS ) #define COS_POLY_FLAGS DENOMINATOR_FLAGS( EVEN_POLY_FLAGS ) WORD UX_SINCOS( UX_FLOAT * unpacked_argument, WORD octant, WORD function_code, UX_FLOAT * unpacked_result) { WORD quadrant, poly_type; UX_FLOAT reduced_argument; U_WORD (* reduce)( UX_FLOAT *, WORD, UX_FLOAT *); /* Get the quadrant bits and the reduced argument */ reduce = (function_code & DEGREE) ? UX_DEGREE_REDUCE : UX_RADIAN_REDUCE; quadrant = reduce( unpacked_argument, octant, &reduced_argument ); function_code &= ~DEGREE; /* ** Select the polynomial coefficients and the form of the ** polynomial based on the quadrant the reduced argument ** lies in. NOTE: the difference between the sin and cos ** has been accounted for in the value of octant. */ if ( SINCOS_FUNC == function_code ) { poly_type = SIN_POLY_FLAGS | COS_POLY_FLAGS | NO_DIVIDE; /* Adjust location of sin/cos polynomials */ poly_type |= ( (quadrant & 1) ? SWAP : NULL ); } else if (quadrant & 1) /* We need to evaluate C(x^2) */ poly_type = SKIP | COS_POLY_FLAGS; else /* We need to evaluate x*S(x^2) */ poly_type = SKIP | SIN_POLY_FLAGS; /* ** Evaluate the polynomial and set the sign based on the quadrant */ EVALUATE_RATIONAL( &reduced_argument, SINCOS_COEF_ARRAY, SINCOS_COEF_ARRAY_DEGREE, poly_type, unpacked_result); if (quadrant & 2) UX_TOGGLE_SIGN(&unpacked_result[0], UX_SIGN_BIT); /* ** If this is a sincos entry point, set the sign on the second ** result */ if ((SINCOS_FUNC == function_code) && ((quadrant + 1) & 2)) UX_TOGGLE_SIGN(&unpacked_result[1], UX_SIGN_BIT); return 0; /* No error conditions for sin/cos */ } /* ** UX_TANCOT is the common evaluation routine fo tan, cot, tand and cotd. ** UX_TANCOT invokes the appropriate reduction routine (radian or degrees) and ** then computes tan or cot as the ratio of two polynomials ** ** An important difference between UX_TANCOT and UX_SINCOS is that for the ** tand/cotd routines, the reduced argument may be zero. Depending on the ** quadrant bits, the correct result would then be either 0 or +/- Inf. The ** common tan/cot evaluation routine detects the +/- Inf case and returns an ** unpacked result with its exponent field set to a large positive value, ** denoted by UX_INFINITY_EXPONENT. */ #if !defined(UX_TANCOT) # define UX_TANCOT __INTERNAL_NAME(ux_tancot__) #endif static WORD UX_TANCOT( UX_FLOAT * unpacked_argument, WORD octant, WORD function_code, UX_FLOAT * unpacked_result) { WORD quadrant, div_flag; UX_FLOAT reduced_argument; U_WORD (* reduce)(UX_FLOAT *, WORD, UX_FLOAT *); /* ** Get the quadrant bits and the reduced argument, check for ** zero and process accordingly. */ reduce = (function_code & DEGREE) ? UX_DEGREE_REDUCE : UX_RADIAN_REDUCE; quadrant = reduce( unpacked_argument, octant, &reduced_argument ); div_flag = ((quadrant + (function_code >> 3)) & 1) ? SWAP : 0; if (0 == G_UX_MSD(&reduced_argument)) { /* reduced argument is zero */ UX_SET_SIGN_EXP_MSD(unpacked_result, 0, UX_ZERO_EXPONENT, 0); if ( div_flag /* == SWAP */ ) { P_UX_EXPONENT(unpacked_result, UX_INFINITY_EXPONENT); P_UX_MSD(unpacked_result, UX_MSB); } return (function_code & TAN_FUNC) ? TAND_ODD_MULTIPLE_OF_90 : COTD_MULTIPLE_OF_180; } /* ** Evaluate z*P(z^2) and and Q(z^2) and perform the appropriate ** division. Set the sign bit according to the quadrant. */ EVALUATE_RATIONAL( &reduced_argument, TANCOT_COEF_ARRAY, TANCOT_COEF_ARRAY_DEGREE, NUMERATOR_FLAGS( SQUARE_TERM | ALTERNATE_SIGN | POST_MULTIPLY) | DENOMINATOR_FLAGS( SQUARE_TERM | ALTERNATE_SIGN) | div_flag, unpacked_result); if (quadrant & 1) UX_TOGGLE_SIGN(unpacked_result, UX_SIGN_BIT); return G_UX_SIGN(unpacked_result) ? COTD_NEG_OVERFLOW : COTD_POS_OVERFLOW; } /* ** Each of the of trig routines call a common routine C_UX_TRIG, to unpack the ** input argument and then dispatch the result to UX_SINCOS or UX_TANCOT ** evaluation routine. For sincos and sincosd entry points, if the return ** value is written by the unpack routine, the common routine must take care ** to write the second result. */ #if !defined(C_UX_TRIG) # define C_UX_TRIG __INTERNAL_NAME(C_ux_trig__) #endif #define F_C_NAN_OR_INF_MASK (SET_BIT(F_C_INF) | SET_BIT(F_C_NAN)) static void C_UX_TRIG( _X_FLOAT * packed_argument, WORD octant, WORD function_code, U_WORD const * class_to_action_map, WORD underflow_error, _X_FLOAT * packed_result OPT_EXCEPTION_INFO_DECLARATION ) { _X_FLOAT *second_value; WORD fp_class, overflow_error; UX_FLOAT unpacked_result[3], unpacked_argument; WORD (* trig_eval)( UX_FLOAT *, WORD, WORD, UX_FLOAT *); trig_eval = (SINCOS_FUNC & function_code) ? UX_SINCOS : UX_TANCOT; fp_class = UNPACK( packed_argument, &unpacked_argument, class_to_action_map, packed_result OPT_EXCEPTION_INFO_ARGUMENT ); if (0 > fp_class) { /* If this is a SINCOS evaluation, write second result */ if (SINCOS_FUNC == (function_code & ~DEGREE)) { second_value = ((1 << F_C_BASE_CLASS(fp_class)) & F_C_NAN_OR_INF_MASK) ? &packed_result[0] : (_X_FLOAT *) _X_ONE; _X_COPY(second_value, &packed_result[1]); } return; } overflow_error = trig_eval( &unpacked_argument, octant, function_code, unpacked_result); PACK( unpacked_result, packed_result, underflow_error, overflow_error OPT_EXCEPTION_INFO_ARGUMENT ); if (SINCOS_FUNC == (function_code & ~DEGREE)) { /* pack second result for sincos evaluations */ PACK( unpacked_result + 1, packed_result + 1, NOT_USED, NOT_USED OPT_EXCEPTION_INFO_ARGUMENT ); } } /* ** The following 6 entry points implement the user level x-float sin/cos and ** sind/cosd functions */ #define TRIG_ENTRY(oct, code, map, under) \ X_X_PROTO(F_ENTRY_NAME, packed_result, packed_argument) \ { \ EXCEPTION_INFO_DECL \ DECLARE_X_FLOAT(packed_result) \ \ INIT_EXCEPTION_INFO; \ C_UX_TRIG( \ PASS_ARG_X_FLOAT(packed_argument), \ oct, code, map, under, \ PASS_RET_X_FLOAT(packed_result) \ OPT_EXCEPTION_INFO); \ RETURN_X_FLOAT(packed_result); \ } # #define TRIG_ENTRY_RR(oct, code, map, under) \ RR_X_PROTO(F_ENTRY_NAME, packed_result1, packed_result2, packed_argument) \ { \ EXCEPTION_INFO_DECL \ _X_FLOAT packed_result[2]; \ \ INIT_EXCEPTION_INFO; \ C_UX_TRIG( \ PASS_ARG_X_FLOAT(packed_argument), \ oct, code, map, under, \ packed_result /*PASS_RET_X_FLOAT(packed_result)*/ \ OPT_EXCEPTION_INFO); \ *packed_result1 = packed_result[0]; \ *packed_result2 = packed_result[1]; \ } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_SIN_NAME TRIG_ENTRY(0, SIN_FUNC, SIN_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_COS_NAME TRIG_ENTRY(2, COS_FUNC, COS_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_SINCOS_NAME TRIG_ENTRY_RR(0, SINCOS_FUNC, SINCOS_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_SIND_NAME TRIG_ENTRY(0, SIND_FUNC, SIND_CLASS_TO_ACTION_MAP, SIND_UNDERFLOW) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_COSD_NAME TRIG_ENTRY(2, COSD_FUNC, COSD_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_SINCOSD_NAME TRIG_ENTRY_RR(0, SINCOSD_FUNC, SINCOSD_CLASS_TO_ACTION_MAP, SIND_UNDERFLOW) /* ** The following 4 entry points implement the user level x-float tan/cot and ** tand/cotd functions */ #undef F_ENTRY_NAME #define F_ENTRY_NAME F_TAN_NAME TRIG_ENTRY(0, TAN_FUNC, TAN_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_COT_NAME TRIG_ENTRY(0, COT_FUNC, COT_CLASS_TO_ACTION_MAP, NOT_USED) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_TAND_NAME TRIG_ENTRY(0, TAND_FUNC, TAND_CLASS_TO_ACTION_MAP, TAND_UNDERFLOW) #undef F_ENTRY_NAME #define F_ENTRY_NAME F_COTD_NAME TRIG_ENTRY(0, COTD_FUNC, COTD_CLASS_TO_ACTION_MAP, NOT_USED) #if defined(MAKE_INCLUDE) @divert -append divertText precision = ceil(UX_PRECISION/8) + 4; # undef TABLE_NAME START_TABLE; TABLE_COMMENT("sin class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "SIN_CLASS_TO_ACTION_MAP\t"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(6) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); TABLE_COMMENT("cos class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "COS_CLASS_TO_ACTION_MAP\t"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(5) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 3) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 3) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 1) ); TABLE_COMMENT("sincos class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "SINCOS_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(4) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 4) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 4) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); TABLE_COMMENT("sind class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "SIND_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(3) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 5) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 5) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); TABLE_COMMENT("cosd class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "COSD_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(2) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 6) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 6) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 1) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 1) ); TABLE_COMMENT("sincosd class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "SINCOSD_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 7) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 7) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); TABLE_COMMENT("Data for the above mappings"); PRINT_U_TBL_ITEM( /* data 1 */ ONE); PRINT_U_TBL_ITEM( /* data 2 */ SIN_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 3 */ COS_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 4 */ SINCOS_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 5 */ SIND_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 6 */ COSD_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 7 */ SINCOSD_OF_INFINITY); TABLE_COMMENT("tan class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "TAN_CLASS_TO_ACTION_MAP\t"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); PRINT_U_TBL_ITEM( /* data 1 */ TAN_OF_INFINITY); TABLE_COMMENT("tand class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "TAND_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); PRINT_U_TBL_ITEM( /* data 1 */ TAND_OF_INFINITY); TABLE_COMMENT("cot class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "COT_CLASS_TO_ACTION_MAP\t"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_UNPACKED, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_ERROR, 3) ); PRINT_U_TBL_ITEM( /* data 1 */ COT_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 2 */ COT_OF_ZERO); PRINT_U_TBL_ITEM( /* data 3 */ COT_OF_NEG_ZERO); TABLE_COMMENT("cotd class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "COTD_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_ERROR, 3) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 4) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_ERROR, 5) ); PRINT_U_TBL_ITEM( /* data 1 */ COTD_OF_INFINITY); PRINT_U_TBL_ITEM( /* data 2 */ COTD_POS_OVERFLOW); PRINT_U_TBL_ITEM( /* data 3 */ COTD_NEG_OVERFLOW); PRINT_U_TBL_ITEM( /* data 4 */ COTD_OF_ZERO); PRINT_U_TBL_ITEM( /* data 5 */ COTD_OF_NEG_ZERO); TABLE_COMMENT("Unpacked constants pi/180"); PRINT_UX_TBL_ADEF_ITEM( "UX_PI_OVER_180\t\t", pi/180); TABLE_COMMENT("Packed constants 1"); PRINT_F_TBL_ADEF_ITEM( "_X_ONE\t\t\t", 1); /* ** Now we compute the "high" digit of 1/90 and 1/12. For 1/12, we would ** to compute and integer R, such that trunc(E/12) = UMULH(R*E). We ** state without proof here that if the number of bits per digit is ** 2*k + d, where d = 0 or 1, then N = 2^(2*k+d) + 2^(3-d) is divisible ** by 12 and taking R = N/12 gives the appropriate result. */ PRINT_UX_FRACTION_DIGIT_TBL_VDEF_ITEM( "MSD_OF_RECIP_90\t\t", nint(bldexp(1/90, BITS_PER_UX_FRACTION_DIGIT_TYPE + 5))); PRINT_UX_FRACTION_DIGIT_TBL_VDEF_ITEM( "RECIP_TWELVE\t\t", ceil(bldexp(1/12, BITS_PER_UX_FRACTION_DIGIT_TYPE))); /* ** Now generate coefficients for computing sin. */ function __sin(x) { if (x == 0) return 1; else return sin(x)/x; } save_precision = precision; precision = ceil(UX_PRECISION/8) + 8; max_arg = pi/4; remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + REMES_SQUARE_ARG, 0, max_arg, __sin, UX_PRECISION, &sin_degree, &ux_rational_coefs); /* ** Now generate coefficients for computing cos and add them to the ** ux_rational coefficient array so that they can be accessed by the ** rational evaluation routine. */ function __cos(x) { return cos(x); } remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + REMES_SQUARE_ARG, 0, max_arg, __cos, UX_PRECISION, &cos_degree, &dummy_coefs); precision = save_precision; k = sin_degree + 1; for (i = 0; i <= cos_degree; i++) ux_rational_coefs[k++] = dummy_coefs[i]; TABLE_COMMENT("Fixed point coefficients for sin and cos evaluation"); PRINT_FIXED_128_TBL_ADEF("SINCOS_COEF_ARRAY\t"); degree = print_ux_rational_coefs(sin_degree, cos_degree, 0); PRINT_WORD_DEF("SINCOS_COEF_ARRAY_DEGREE", degree ); /* ** Last but not least, get the rational coefficients for tan/cot */ function __tan(x) { if (x == 0) return 1; else return tan(x)/x; } save_precision = precision; precision = ceil(UX_PRECISION/8) + 8; max_arg = pi/4; remes(REMES_FIND_RATIONAL + REMES_RELATIVE_WEIGHT + REMES_SQUARE_ARG, 0, max_arg, __tan, UX_PRECISION, &num_degree, &den_degree, &ux_rational_coefs); precision = save_precision; TABLE_COMMENT("Fixed point coefficients for tan and cot evaluation"); PRINT_FIXED_128_TBL_ADEF("TANCOT_COEF_ARRAY\t"); degree = print_ux_rational_coefs(num_degree, den_degree, 0); PRINT_WORD_DEF("TANCOT_COEF_ARRAY_DEGREE", degree ); TABLE_COMMENT("Unpacked value of pi/4"); PRINT_UX_TBL_ADEF_ITEM( "UX_PI_OVER_FOUR", pi/4); END_TABLE; @end_divert @eval my $tableText; \ my $outText = MphocEval( GetStream( "divertText" ) ); \ my $defineText = Egrep( "#define", $outText, \$tableText ); \ $outText = "$tableText\n\n$defineText"; \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Definitions and constants trigonometric " . \ "routines", __FILE__ ); \ print "$headerText\n\n$outText\n"; #endif LIBRARY/float128/dpml_ux_log.c0000644€­ Q00042560000004155314616534611016117 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #define BASE_NAME log #include "dpml_ux.h" #if !defined(MAKE_INCLUDE) # include STR(BUILD_FILE_NAME) #endif /* The basic design of for the log functions relies on a common evaluation ** routine. The evaluation routine is based on the identities: ** ** logb(x) = ln(x)/ln(b) (1) ** ln(2^n*f) = n*ln(2) + ln(f) (2) ** ln[(1+x)/(1-x)] = 2*sum{ k = 0,... | x^(2k+1)/(2k+1) } (3) ** ** Assuming that x = 2^n*f, where 1/2 <= f < 1, we define g and m as: ** ** g = f; ** m = n; ** if (f < 1/sqrt(2)) ** { ** g = 2*f; ** m = n - 1; ** } ** ** Then x = 2^m*g where 1/sqrt(2) <= g < sqrt(2). From (2) and (3) it follows ** that ** g - 1 ** ln(x) = m*ln(2) + z*p(z^2) where z = ----- ** g + 1 ** ** Then from (1) it follows that ** ** logb(x) = m*ln(2)/ln(b) + z*p(z^2)/ln(b) ** = [m + z*r(z^2)]*[1/ln(b)] ** ** UX_LOG_POLY is a convenience functions that allows for the evaluation of ** the log polynomial without having to know the address of the coefficients ** and automatically multiplies by ln2. */ void UX_LOG_POLY( UX_FLOAT * unpacked_argument, UX_FLOAT * unpacked_result) { EVALUATE_RATIONAL( unpacked_argument, LOG2_COEF_ARRAY, LOG2_COEF_ARRAY_DEGREE, NUMERATOR_FLAGS(SQUARE_TERM | POST_MULTIPLY), unpacked_result); MULTIPLY(unpacked_result, LN_2, unpacked_result); } void UX_LOG( UX_FLOAT * unpacked_argument, UX_FLOAT * scale, UX_FLOAT * unpacked_result) { UX_FLOAT tmp[2]; UX_EXPONENT_TYPE m; UX_FRACTION_DIGIT_TYPE f_hi; /* ** Compute z = (g - 1)/(g + 1). Make sure to restore the input ** argument to its original value in case the caller needs to use ** it again. */ m = G_UX_EXPONENT(unpacked_argument); f_hi = G_UX_MSD(unpacked_argument); if (f_hi <= ONE_OVER_SQRT_2) m--; UX_DECR_EXPONENT(unpacked_argument, m); ADDSUB(unpacked_argument, UX_ONE, ADD_SUB | MAGNITUDE_ONLY, &tmp[0]); UX_INCR_EXPONENT(unpacked_argument, m); DIVIDE(&tmp[1], &tmp[0], FULL_PRECISION, unpacked_result); /*printf("UX_LOG: tmp1=(%x %x) %llx %llx, tmp0=(%x %x) %llx %llx, r=(%x %x) %llx %llx\n", tmp[1].sign,tmp[1].exponent,tmp[1].fraction[0],tmp[1].fraction[1], tmp[0].sign,tmp[0].exponent,tmp[0].fraction[0],tmp[0].fraction[1], unpacked_result->sign,unpacked_result->exponent,unpacked_result->fraction[0],unpacked_result->fraction[1]);*/ /* Evaluate z*p(z^2) */ EVALUATE_RATIONAL( unpacked_result, LOG2_COEF_ARRAY, LOG2_COEF_ARRAY_DEGREE, NUMERATOR_FLAGS(SQUARE_TERM | POST_MULTIPLY), &tmp[0] ); /* Get m as a packed value and add to polynomial */ /*printf("UX_LOG: tmp1=(%x %x) %llx %llx, tmp0=(%x %x) %llx %llx, u_res=(%x %x) %llx %llx\n", tmp[1].sign,tmp[1].exponent,tmp[1].fraction[0],tmp[1].fraction[1], tmp[0].sign,tmp[0].exponent,tmp[0].fraction[0],tmp[0].fraction[1], unpacked_result->sign,unpacked_result->exponent,unpacked_result->fraction[0],unpacked_result->fraction[1]);*/ WORD_TO_UX(m, unpacked_result); //printf("m=%llx\n",(long long)m); ADDSUB(unpacked_result, &tmp[0], ADD | NO_NORMALIZATION, unpacked_result); /* multiply by scale */ //printf("u_res= (%x %x) %llx %llx\n",unpacked_result->sign,unpacked_result->exponent,unpacked_result->fraction[0],unpacked_result->fraction[1]); if (scale) MULTIPLY( unpacked_result, scale, unpacked_result); return; } #if !defined(C_UX_LOG) # define C_UX_LOG __INTERNAL_NAME(C_ux_log__) #endif static void C_UX_LOG( _X_FLOAT * packed_argument, U_WORD const * class_to_action_map, UX_FLOAT * scale, _X_FLOAT * packed_result OPT_EXCEPTION_INFO_DECLARATION ) { WORD fp_class, index; UX_FLOAT unpacked_argument, unpacked_result; fp_class = UNPACK( packed_argument, & unpacked_argument, class_to_action_map, packed_result OPT_EXCEPTION_INFO_ARGUMENT ); //printf("UX_LOG: packed arg=%llx %llx, unpacked_arg=(%x %x) %llx %llx\n",packed_argument->digit[0],packed_argument->digit[1],unpacked_argument.sign,unpacked_argument.exponent,unpacked_argument.fraction[0],unpacked_argument.fraction[1]); if (0 > fp_class) return; UX_LOG( &unpacked_argument, scale, &unpacked_result); PACK( &unpacked_result, packed_result, NOT_USED, NOT_USED OPT_EXCEPTION_INFO_ARGUMENT ); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_LN_NAME X_X_PROTO(F_ENTRY_NAME, packed_result,packed_argument) { EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) INIT_EXCEPTION_INFO; C_UX_LOG( PASS_ARG_X_FLOAT(packed_argument), LOG_CLASS_TO_ACTION_MAP, LN_2, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_LOG2_NAME X_X_PROTO(F_ENTRY_NAME, packed_result,packed_argument) { EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) INIT_EXCEPTION_INFO; C_UX_LOG( PASS_ARG_X_FLOAT(packed_argument), LOG2_CLASS_TO_ACTION_MAP, 0, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO); RETURN_X_FLOAT(packed_result); } #undef F_ENTRY_NAME #define F_ENTRY_NAME F_LOG10_NAME X_X_PROTO(F_ENTRY_NAME, packed_result,packed_argument) { EXCEPTION_INFO_DECL DECLARE_X_FLOAT(packed_result) INIT_EXCEPTION_INFO; C_UX_LOG( PASS_ARG_X_FLOAT(packed_argument), LOG10_CLASS_TO_ACTION_MAP, LOG10_2, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO); RETURN_X_FLOAT(packed_result); } /* ** If we compute log1p(x) as log(1+x), then for small arguments a loss of ** significance will occur when computing the reduced argument for the generic ** log evaluation. Consequently we screen out x such that ** ** 1/sqrt(2) <= 1 + x < sqrt(2), ** ** or equivalently, ** ** 1/sqrt(2) - 1 <= x < sqrt(2) - 1 (4) ** ** We do this comparison "approximately" and in several phases. First we ** screen x to lie in the interval (-1/2, 1/2) by looking at the exponent ** field of x. Then we eliminate arguments with |x| <= 1/4, since these are ** known to satisfy (4). At this point |x| = 2^(-1)*f and we can approximate ** 1 + x using only the high fraction digit x, F1. Letting ** N = BITS_PER_DIGIT_TYPE: ** ** 1 + x = 2^(N-1)/2^(N-1) + 2^(-1)*F1/2^N ** = 2^(N-1)/2^(N-1) + F1/2^(N+1) ** = [2^(N-1) + F1/4]/2^(N-1) ** ** So we define an integer G such that G/2^(N-1) ~ 1 + x by, ** ** G <-- F1 >> 2 ** if (x < 0) ** G <-- -G ** G <-- G + (1 << (N-1)) ** ** At this point we define two other integers: ** ** I_RECIP_SQRT_2 <-- nint[2^(N-1)/sqrt(2)] ** I_SQRT_2 <-- nint[2^(N-1)*sqrt(2)] ** ** Then the range check: 1/sqrt(2) < 1 + x < sqrt(2) is "equivalent" to ** ** I_RECIP_SQRT_2 < G < I_SQRT_2. */ #undef F_ENTRY_NAME #define F_ENTRY_NAME F_LOG1P_NAME X_X_PROTO(F_ENTRY_NAME, packed_result,packed_argument) { WORD fp_class; UX_SIGN_TYPE sign; UX_EXPONENT_TYPE exponent; UX_FRACTION_DIGIT_TYPE f_hi; UX_FLOAT unpacked_argument, unpacked_result, tmp; DECLARE_X_FLOAT(packed_result) EXCEPTION_INFO_DECL INIT_EXCEPTION_INFO; fp_class = UNPACK( PASS_ARG_X_FLOAT(packed_argument), & unpacked_argument, LOG1P_CLASS_TO_ACTION_MAP, PASS_RET_X_FLOAT(packed_result) OPT_EXCEPTION_INFO ); if (0 > fp_class) RETURN_X_FLOAT(packed_result); /* ** Screen out negative values <= -1. For values less than ** -1, force "underflow". For arguments equal to -1, force ** "overflow". */ exponent = G_UX_EXPONENT( &unpacked_argument ); sign = G_UX_SIGN( &unpacked_argument ); f_hi = G_UX_MSD( &unpacked_argument ); if (exponent >= 0) { /* |arg| >= 1/2. */ if ( exponent >= 1 ) { /* |arg| >= 1. Check for arg <= -1 */ if (sign) { /* arg <= -1, start by forcing overflow */ P_UX_MSD( &unpacked_result, UX_MSB); P_UX_EXPONENT( &unpacked_result, UX_OVERFLOW_EXPONENT); if ((exponent == 1) && (f_hi == UX_MSB) && UX_LOW_FRACTION_IS_ZERO( &unpacked_argument )) /* This is -1. Force underflow */ P_UX_EXPONENT(&unpacked_result, UX_UNDERFLOW_EXPONENT); goto pack_it; } } goto big_argument; } else if (exponent <= -2) /* |arg| <= 1/4. */ goto small_argument; /* ** If we get here, 1/4 < |arg| < 1/2. We need to check see if ** 1/sqrt(2) < 1 + x < sqrt(2) */ f_hi = f_hi >> 2; f_hi = (sign) ? -f_hi : f_hi; f_hi += UX_MSB; if ( (UX_FRACTION_DIGIT_TYPE) (f_hi - I_RECIP_SQRT_2) >= (I_SQRT_2 - I_RECIP_SQRT_2)) goto big_argument; small_argument: /* ** If we get here, we know 1/sqrt(2) < 1 + x < sqrt(2). To ** avoid loss of significance, compute the reduced argument ** as x/(2+x) and evaluate the log polynomial. */ ADDSUB( UX_TWO, &unpacked_argument, ADD, &tmp); DIVIDE(&unpacked_argument, &tmp, FULL_PRECISION, &tmp); EVALUATE_RATIONAL( &tmp, LOG2_COEF_ARRAY, LOG2_COEF_ARRAY_DEGREE, NUMERATOR_FLAGS(SQUARE_TERM | POST_MULTIPLY), &unpacked_result ); MULTIPLY( &unpacked_result, LN_2, &unpacked_result); goto pack_it; big_argument: /* If we get here, just compute 1 + x and call the log */ ADDSUB( UX_ONE, &unpacked_argument, ADD, &unpacked_result); UX_LOG( &unpacked_result, LN_2, &unpacked_result); pack_it: PACK( &unpacked_result, PASS_RET_X_FLOAT(packed_result), LOG_OF_ZERO, LOG_OF_NEGATIVE OPT_EXCEPTION_INFO ); RETURN_X_FLOAT(packed_result); } #if defined(MAKE_INCLUDE) @divert -append divertText precision = ceil(UX_PRECISION/8) + 4; # undef TABLE_NAME START_TABLE; TABLE_COMMENT("log class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "LOG_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_ERROR, 1) ); PRINT_U_TBL_ITEM( /* data 1 */ LOG_OF_NEGATIVE ); PRINT_U_TBL_ITEM( /* data 2 */ LOG_OF_ZERO ); TABLE_COMMENT("log2 class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "LOG2_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_ERROR, 1) ); PRINT_U_TBL_ITEM( /* data 1 */ LOG2_OF_NEGATIVE ); PRINT_U_TBL_ITEM( /* data 2 */ LOG2_OF_ZERO ); TABLE_COMMENT("log10 class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "LOG10_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_NORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_ERROR, 2) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_ERROR, 1) ); PRINT_U_TBL_ITEM( /* data 1 */ LOG10_OF_NEGATIVE ); PRINT_U_TBL_ITEM( /* data 2 */ LOG10_OF_ZERO ); TABLE_COMMENT("log1p class-to-action-mapping"); PRINT_CLASS_TO_ACTION_TBL_DEF( "LOG1P_CLASS_TO_ACTION_MAP"); PRINT_64_TBL_ITEM( CLASS_TO_ACTION_DISP(1) + CLASS_TO_ACTION( F_C_SIG_NAN, RETURN_QUIET_NAN, 0) + CLASS_TO_ACTION( F_C_QUIET_NAN, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_INF, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_INF, RETURN_ERROR, 1) + CLASS_TO_ACTION( F_C_POS_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_DENORM, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_POS_ZERO, RETURN_VALUE, 0) + CLASS_TO_ACTION( F_C_NEG_ZERO , RETURN_VALUE, 0) ); PRINT_U_TBL_ITEM( /* data 1 */ LOG_OF_NEGATIVE); /* ** NOTE: the fraction fields of 1/sqrt(2) and sqrt(2) are identical, so ** that in the above code, the symbolic constants ONE_OVER_SQRT_2 and ** I_SQRT_2 have the same numerical value. */ TABLE_COMMENT("MSD of sqrt(2) and 1/sqrt(2) (in fixed point)"); tmp = trunc(bldexp(sqrt(2), BITS_PER_UX_FRACTION_DIGIT_TYPE - 1)); PRINT_UX_FRACTION_DIGIT_TBL_VDEF( "ONE_OVER_SQRT_2\t\t"); PRINT_UX_FRACTION_DIGIT_TBL_VDEF_ITEM( "I_SQRT_2\t\t", tmp); PRINT_UX_FRACTION_DIGIT_TBL_VDEF_ITEM( "I_RECIP_SQRT_2\t\t", trunc(tmp/2)); /* ** Now generate coefficients for computing log. */ zero_value = 2/log(2); function __log2(x) { if (x == 0) return zero_value; else return atanh(x)*zero_value/x; } save_precision = precision; precision = ceil(UX_PRECISION/8) + 8; max_arg = (sqrt(2) - 1)^2; TABLE_COMMENT("Fixed point coefficients for log2 evaluation"); remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + REMES_SQUARE_ARG, 0, max_arg, __log2, UX_PRECISION, °ree, &ux_rational_coefs); precision = save_precision; PRINT_FIXED_128_TBL_ADEF("LOG2_COEF_ARRAY\t\t"); PRINT_WORD_DEF("LOG2_COEF_ARRAY_DEGREE\t", degree); print_ux_rational_coefs(degree, 0, 0); TABLE_COMMENT("Unpacked constants 1, 2, log(2) and log(10)"); PRINT_UX_TBL_ADEF_ITEM( "UX_ONE", 1); PRINT_UX_TBL_ADEF_ITEM( "UX_TWO", 2); PRINT_UX_TBL_ADEF_ITEM( "LN_2", log(2)); PRINT_UX_TBL_ADEF_ITEM( "LOG10_2", log10(2)); END_TABLE; @end_divert @eval my $tableText; \ my $outText = MphocEval( GetStream( "divertText" ) ); \ my $defineText = Egrep( "#define", $outText, \$tableText ); \ $outText = "$tableText\n\n$defineText"; \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Definitions and constants logarithmic" . \ " routines", __FILE__ ); \ print "$headerText\n\n$outText\n"; #endif LIBRARY/float128/dpml_ux_ops_64.c0000644€­ Q00042560000007211214616534611016443 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #include "dpml_ux.h" #if (NUM_UX_FRACTION_DIGITS != 2) # error "Must have 64 bit integers" #endif /* ** MULTIPLY essentially computes the high 128 bits of the product of two ** unpacked x-float values. The algorithm attempts to limit the number ** of integer multiplications performed. The resulting product has roughly ** a 6 lsb error bound in the worst case. */ void MULTIPLY(UX_FLOAT * x, UX_FLOAT *y, UX_FLOAT *z) { U_WORD x_hi, x_lo, y_hi, y_lo, z_hi, z_lo, p1, p2; x_hi = G_UX_MSD(x); y_hi = G_UX_MSD(y); z_lo = y_hi*x_hi; x_lo = G_UX_LSD(x); y_lo = G_UX_LSD(y); UMULH(y_hi, x_lo, p2); P_UX_SIGN(z, G_UX_SIGN(x) ^ G_UX_SIGN(y)); P_UX_EXPONENT(z, G_UX_EXPONENT(x) + G_UX_EXPONENT(y)); UMULH(y_lo, x_hi, p1); z_lo += p2; z_hi = (z_lo < p2); UMULH(y_hi, x_hi, p2); z_lo = z_lo + p1; z_hi += (z_lo < p1); P_UX_LSD(z, z_lo); z_hi = z_hi + p2; P_UX_MSD(z, z_hi); } /* ** EXTENDED_MULTIPLY computes the exact 256 bit product of two unpacked ** x-float values. The result is stored in two unpacked x-float values ** containing the high and low 128 bits of the result */ void EXTENDED_MULTIPLY(UX_FLOAT * x, UX_FLOAT * y, UX_FLOAT * hi, UX_FLOAT * lo) { UX_EXPONENT_TYPE exponent; UX_SIGN_TYPE sign; UX_FRACTION_DIGIT_TYPE x_hi, x_lo, y_hi, y_lo, tmp_digit, carry, p1, p2; x_lo = G_UX_LSD(x); y_lo = G_UX_LSD(y); p1 = y_lo*x_lo; x_hi = G_UX_MSD(x); y_hi = G_UX_MSD(y); UMULH(y_lo, x_lo, tmp_digit); P_UX_LSD(lo, p1); sign = G_UX_SIGN(x) ^ G_UX_SIGN(y); exponent = G_UX_EXPONENT(x) + G_UX_EXPONENT(y); P_UX_SIGN(lo, sign); P_UX_EXPONENT(lo, exponent - 128); p1 = y_lo*x_hi; P_UX_SIGN(hi, sign); P_UX_EXPONENT(hi, exponent); p2 = y_hi*x_lo; P_UX_SIGN(lo, sign); P_UX_EXPONENT(lo, exponent - 128); tmp_digit += p1; carry = (tmp_digit < p1); p1 = x_hi*y_hi; tmp_digit += p2; carry += (tmp_digit < p2); P_UX_MSD(lo, tmp_digit); UMULH(y_hi, x_lo, p2); tmp_digit = p1 + carry; carry = (tmp_digit < p1); UMULH(y_lo, x_hi, p1); tmp_digit += p2; carry += (tmp_digit < p2); UMULH(y_hi, x_hi, p2); tmp_digit += p1; carry += (tmp_digit < p1); P_UX_LSD(hi, tmp_digit); tmp_digit = p2 + carry; P_UX_MSD(hi, tmp_digit); } /* ** This routine divides two unpacked numbers: ** ** o The 'flags' argument controls whether a FULL or HALF precision ** result is generated. ** o If the pointer to one of the unpacked results is 0, then that ** argument is implicitly treated as being equal to 1. ** o both argument pointers *CANNOT* be zero. * ** A detailed description of the algorithm is presented in note 6.2 of the ** X_FLOAT notes conference. Note that to the extent possible, the variable ** names in this routine were chosen to match the description in the design ** note. In particular, upper case name imply 64 bit integer data types, while ** double precision values are denoted with lower case names. */ #define _D_POW_2(n) ((double) ((U_WORD)1 << n)) #define TWO_POW_62 (_D_POW_2(62)) #define TWO_POW_124 (TWO_POW_62*TWO_POW_62) #define RECIP_TWO_POW_16 (1./_D_POW_2(16)) #define RECIP_TWO_POW_60 (1./_D_POW_2(60)) #define RECIP_TWO_POW_184 (4./(TWO_POW_124 * TWO_POW_62 )) static const UX_FLOAT __ux_one__ = { 0, 1, ((U_WORD) 1 << 63), 0 }; void DIVIDE( UX_FLOAT * aPtr, UX_FLOAT * bPtr, U_WORD flags, UX_FLOAT * cPtr) { UX_EXPONENT_TYPE exponent; UX_FRACTION_DIGIT_TYPE A1, A2, B1, B2, Q1, Q2, S, R, P00, P01, P11, N0, N1, N2, C1, mask, E; D_TYPE r, b_hi, b_lo, r_hi, r_lo, a_hi, a_lo, a, q_hi, q_lo; /* ** for performance reasons, pre-load some of the interesting items even ** though we might not actually use them. Specifically, by loading B1 ** and B2 before the normalization check allows the compiler to better ** schedule the code after the check. */ bPtr = (bPtr == 0) ? (UX_FLOAT *)&__ux_one__ : bPtr; aPtr = (aPtr == 0) ? (UX_FLOAT *)&__ux_one__ : aPtr; B1 = G_UX_MSD(bPtr); B2 = G_UX_LSD(bPtr); if (bPtr == &__ux_one__) { UX_COPY(aPtr, cPtr); return; } /* ** If b isn't normalized, then the whole algorithm falls apart. So make ** sure that b is normalized. */ if ((UX_SIGNED_FRACTION_DIGIT_TYPE) B1 >= 0) { NORMALIZE(bPtr); B1 = G_UX_MSD(bPtr); B2 = G_UX_LSD(bPtr); } /* ** The first step is to estimate 1/b in double precsion to more then 70 ** bits. This is done by getting an initial estimate to 1/b and use a ** variation of Newton's iteration to improve the accuracy. The basic ** approach is ** ** b' = high 53 bits of b ** b_hi' = high 26 bits of b ** b_lo' = bits 27 through 80 of b ** ** r' = 1/b' ** r_hi' = high 26 bits of r ** r_lo' = [ (1 - b_hi'*r_hi') - b_lo'*r_hi'] * r' ** ** However, there is certain amount of weird scaling of the values that ** takes place to deal with the integer to float conversion and subsequent ** uses of the results. ** ** Note that the two macros below are used to convert *signed* integers ** to and from double precision. We use signed conversions because they ** are generally faster than unsigned conversions. */ # define TO_DOUBLE(a) ((double) ((UX_SIGNED_FRACTION_DIGIT_TYPE) (a))) # define TO_DIGIT(a) ((UX_SIGNED_FRACTION_DIGIT_TYPE) (a)) r = TWO_POW_124 / TO_DOUBLE( B1 >> 1 ); /* ** While the divide is going on, we can compute all sorts of stuff */ mask = MAKE_MASK( 38, 0 ); b_hi = TO_DOUBLE((B1 & ~mask) >> 1); b_lo = RECIP_TWO_POW_16 * TO_DOUBLE(((B1 & mask) << 15) | (B2 >> 49)); A1 = G_UX_MSD(aPtr); A2 = G_UX_LSD(aPtr); P_UX_SIGN( cPtr, G_UX_SIGN(aPtr) ^ G_UX_SIGN(bPtr) ); exponent = G_UX_EXPONENT(aPtr) - G_UX_EXPONENT(bPtr); /* ** Get the high part of r as both an integer and a floating point value. ** In the process, bias r_hi downward to insure that r_lo is positive. ** (See the design note for details.) */ R = TO_DIGIT( r ); R = (R - (5 << 8)) & ~MAKE_MASK( 36, 0 ); r_hi = TO_DOUBLE(R); /* ** At this point we have: ** ** r = 2^61 * r' ** r_hi = 2^61 * r_hi' ** b_hi = 2^63 * b_hi' ** b_lo = 2^63 * b_lo' ** ** so that ** ** 2*r_lo' = [ (2^124 - b_hi*r_hi) - b_lo*r_hi ] * (r/2^184) */ r_lo = D_GROUP(D_GROUP((TWO_POW_124) - b_hi*r_hi) - (b_lo*r_hi)) * (RECIP_TWO_POW_184*r); /* ** Now that we have 1/b ~ r_hi' + r_lo' (scaling notwithstanding), we can ** compute an approximation to q = a/b = a*(1/b), where the product is ** performed in high and low pieces: ** ** q = (a_hi' + a_lo') * (r_hi' + r_lo') ** = a_hi' * r_hi' + [ a_lo' * r_hi' + (a_hi' + a_lo') * r_lo' ] ** = a_hi' * r_hi' + [ a_lo' * r_hi' + a' * r_lo' ] ** = q_hi' + q_lo' ** ** Note that in the above, we want to insure that a' ~ a_hi' + a_lo' is ** less than the actual value of a to insure that the computed value of ** q is less that 2. */ a = TO_DOUBLE( (A1 >> 11) << 10 ); a_hi = TO_DOUBLE( (A1 & ~mask) >> 1); a_lo = RECIP_TWO_POW_16 * TO_DOUBLE(((A1 & mask) << 15) | (A2 >> 49)); r_hi = RECIP_TWO_POW_60 * r_hi; q_hi = a_hi*r_hi; q_lo = a_lo*r_hi + a*r_lo; /* ** With the above conversions and computations we have ** ** a = 2^63*a' ** a_hi = 2^63*a_hi' ** a_lo = 2^63*a_lo' ** r_hi = 2*r_hi' ** r_lo = 2*r_lo' ** q_hi = 2^64 * q_hi' ** q_lo = 2^64 * q_lo' ** ** We would like to convert the high 65 bits of q_hi + q_lo into integers, ** S' and Q1'. Note that converting q_hi to an integer can cause an ** overflow. However since q_hi contains only 52 significant bits, we ** can convert .25 * q_hi instead which won't overflow. */ Q1 = TO_DIGIT(.25 * q_hi); E = TO_DIGIT( q_lo ); S = ( Q1 >> 62 ); Q1 = (4*Q1) + E; S += (Q1 < E); Q2 = 0; if (flags == HALF_PRECISION) goto pack_it; /* ** While we're at it, compute an integer approximation to 1/b. I.e. get ** and integer R such that R/2^63 ~ 1/b. ** ** R = 2^63 * (r_hi' + r_lo' ) ** = 2^63 * r_hi' + 2^63 * r_lo' ** = 2^63 * r_hi' + 2^62 * r_lo ** ** Recall that in the original computation of r_hi, we previously computed ** the integer value R as 2^61*r_hi', so that we can now compute ** ** R <-- 4*R + 2^62 * r_lo ** ** Note that for b very close to 1/2, R will be 2^64 which can't be ** represented in 64 bits. In this case, we take R = 2^64 - 1 which is ** close enough and can be represented in 64 bits. */ R = (R << 2) + TO_DIGIT( TWO_POW_62*r_lo ); R = ( R == 0 ) ? ( (UX_SIGNED_FRACTION_DIGIT_TYPE) -1 ) : R; /* ** Using S and Q1 as the current guess for the high 65 bits of the result ** compute the remainder: ** ** +----------+----------+ ** | A1 | A2 | 2^128*(2^64*A1 + A2) ** +----------+----------+ ** ** +----------+----------+ ** | B1 | B2 | s'*2^128*(2^64*B1 + B2) ** +----------+----------+ ** | Q1'*B1 | 2^128*Q1'*B1 ** +----------+----------+----------+ ** | Q1'*B2 | 2^64*Q1'*B2 ** +----------+----------+ ** ** +----------+----------+----------+----------+ ** | N0' | N1' | N2' | N3' | ** +----------+----------+----------+----------+ ** ** Start by summing all the products into N0:N1:N2:N3 ** ** NOTE: for performance reasons, we don't actually ** compute N3' */ mask = -S; UMULH( Q1, B2, P11 ); P01 = Q1 * B1; UMULH( Q1, B1, P00 ); N2 = B2 & mask; /* N2/N1 = B2/B1 if S = 1, 0 otherwise */ N1 = B1 & mask; N2 += P11; C1 = (N2 < P11); N2 += P01; C1 += (N2 < P01); N1 += P00; N0 = (N1 < P00); N1 += C1; N0 += (N1 < C1); /* Subtract the sum from A1:A2 */ N0 = -N0; C1 = (A2 < N2); N2 = A2 - N2; N0 -= (A1 < N1); N1 = A1 - N1; N0 -= (N1 < C1); N1 -= C1; /* ** Since the original estimate to S:Q1 was good to more then 70 bits, the ** current value of S:Q1 can be off by at most one. By looking at the ** values of N0 and N1, we can determine an adjustment, E, to S:Q1. ** With the adjusted S:Q1 we know that N0 = N1 = 0, so we only need to ** adjust N2. */ E = (N0 | (N1 != 0)); mask = (E == 0) ? B1 : N0; N2 = N2 - (mask ^ B1); /* ** Using R/2^63 ~ 1/b and the adjusted N2, compute an approximation to Q2 ** Note that if Q2 has it's high bit set, then the original value of E was ** one too low. */ UMULH( R, N2, Q2 ); E += ( ( (UX_SIGNED_FRACTION_DIGIT_TYPE) Q2 ) < 0); Q2 = 2*Q2 + ((A1 | A2) != 0); /* Make sure 0/b is zero */ /* Adjust S and Q1 using the final value of E */ Q1 += E; S = S + (((UX_SIGNED_FRACTION_DIGIT_TYPE) E) >> 63) + (Q1 < E); /* Last but not least, pack it */ pack_it: P_UX_MSD( cPtr, (S << 63) | (Q1 >> S) ); P_UX_LSD( cPtr, ((Q1 & S) << 63) | (Q2 >> S) ); P_UX_EXPONENT(cPtr, exponent + S); return; } /* ** ** The following two routines evaluate polynomials, P(x), via Horner's ** scheme for positive x: ** ** s(k) <-- c(k) +/- x*s(k+1) for k = n-1, ..., 0 ** ** where the c(k)'s are the polynomial coefficients and s(n) = c(n). The ** arguments to these routines (not in order) are ** ** x a pointer to the unpacked bits of x ** cnt the degree of the polynomial ** coef A pointer to pairs of quadwords specifying the hi/lo ** bits of the coefficient. We assume the coefficients ** are stored reverse order: c(n) to c(0) ** shift cnt*(x->exp) - This is passed in rather than computed ** here sense on the calling side, cnt is a known ** constant, so the multiply can be done by shifts and ** adds rather than a real integer multiply. ** p a pointer to the unpacked result. ** ** The routines return the high bits of the result. ** ** IMPORTANT ASSUMPTIONS: ** ###################### ** ** o This routine assumes that the terms of the polynomial are decreasing. ** I.e. that c(k) > x*s(k+1) for all k. ** ** o shift = cnt*(x->exp), so that if shift is decremented by x->exp ** each time cnt decremented, then shift will become 0 before cnt ** becomes negative. */ static void __eval_pos_poly(UX_FLOAT * x, WORD shift, FIXED_128 * coef, WORD cnt, UX_FLOAT * p) { UX_FRACTION_DIGIT_TYPE c_hi, c_lo, s_hi, s_lo, p1, p2; UX_FRACTION_DIGIT_TYPE x_hi, x_lo, carry; UX_EXPONENT_TYPE exponent; WORD shift_inc; /* Initialize internal copies and accumulators */ x_hi = G_UX_MSD(x); x_lo = G_UX_LSD(x); shift_inc = G_UX_EXPONENT(x); s_lo = s_hi = 0; /* ** If the shift count is >= 128, than this product won't contribute to ** the final product. Skip over all of the coefficients that correspond ** to large shifts */ if (shift < 128) goto p_check_shift_64_to_127; p_shift_ge_128: shift += shift_inc; coef++; cnt--; if (shift >= 128) goto p_shift_ge_128; //printf("Eval_pos_poly, shift=%lld !!\n",shift); /* ** Each time through this loop, c_hi = 0. Since we assume that c(k) > ** x*s(k+1), if there is a carry out on the sum s(k) = c(k) + x*s(k*1), ** then the shift count for the next iteration must be less than 64. ** Consequently, we need only worry about the carry out from the sum ** when we leave this loop. That means each time we enter the top of ** the loop, both c_hi and s_hi = 0; */ p_check_shift_64_to_127: if (shift < 64) goto p_check_shift_1_to_63; /* ** Depending on the size of shift_inc and the rate at which the ** coefficients decrease, several of the next Horner's scheme iterations ** will yield zero results, so there is no need to do the multiply. ** Since multiplies are likely to be expensive, we check for this case ** and skip over them. */ if (s_lo) goto p_shift_64_to_127; p_shift_64_to_127_zero_loop: s_lo = coef->digits[1] >> (shift - 64); //printf("s_lo, sh, sh_inc, c: %llx, %llx, %llx, %llx (%llx)\n",s_lo,shift, shift_inc,coef->digits[1],coef->digits[0]); shift += shift_inc; coef++; cnt--; if (shift < 64) goto p_check_shift_1_to_63; if (s_lo == 0) goto p_shift_64_to_127_zero_loop; /* ** s_lo is no longer zero, so do the multiply and accumulate the ** products. */ p_shift_64_to_127: //printf("s_lo,x_hi,p1: %llx, %llx, %llx\n",s_lo,x_hi,p1); UMULH(s_lo, x_hi, p1); //printf("s_lo,x_hi,p1: %llx, %llx, %llx\n",s_lo,x_hi,p1); c_lo = coef->digits[1] >> (shift - 64); shift += shift_inc; coef++; cnt--; s_lo = c_lo + p1; if (shift >= 64) goto p_shift_64_to_127; /* Set carry out from last add */ s_hi = (s_lo < p1); /* ** When shift = 0, the complementary shift is 64. ANSI C does not ** specify the result of a shift by 64, so we need to handle this as ** a special case. */ p_check_shift_1_to_63: exponent = 0; if (shift == 0) goto p_shift_eq_0; /* ** Depending on the size of shift_inc and the rate at which the ** coefficients decrease, several of the next Horner's scheme iterations ** will yield zero results for s_hi, so there is no need to do the ** multiplies associated with s_hi. Since multiplies are likely to be ** expensive, we check for this case and skip over them. */ if (s_hi) goto p_shift_1_to_63; p_shift_1_to_63_zero_loop: UMULH(s_lo, x_hi, p1); c_hi = coef->digits[1]; c_lo = coef->digits[0]; c_lo = (c_lo >> shift) | (c_hi << (64 - shift)); s_hi = c_hi >> shift; shift += shift_inc; coef++; cnt--; s_lo = c_lo + p1; s_hi += (s_lo < p1); if (shift == 0) goto p_shift_eq_0; if (s_hi == 0) goto p_shift_1_to_63_zero_loop; p_shift_1_to_63: while (cnt >= 0) { p1 = s_hi*x_hi; c_hi = coef->digits[1]; c_lo = coef->digits[0]; c_lo = (c_lo >> shift) | (c_hi << (64 - shift)); c_hi >>= shift; UMULH(s_hi, x_lo, p2); c_lo += p1; carry = (c_lo < p1); cnt--; UMULH(s_lo, x_hi, p1); c_lo += p2; carry += (c_lo < p2); shift += shift_inc; UMULH(s_hi, x_hi, p2); s_lo = c_lo + p1; carry += (s_lo < p1); c_hi += carry; carry = (c_hi < carry); coef++; s_hi = c_hi + p2; carry += (s_hi < p2); if (carry) { s_lo = (s_lo >> 1) | (s_hi << 63); s_hi = (s_hi >> 1) | SET_BIT(63); shift++; exponent++; } if (shift == 0) break; } p_shift_eq_0: while (cnt >= 0) { p1 = s_hi*x_hi; c_hi = coef->digits[1]; c_lo = coef->digits[0]; UMULH(s_hi, x_lo, p2); c_lo += p1; carry = (c_lo < p1); cnt--; UMULH(s_lo, x_hi, p1); c_lo += p2; carry += (c_lo < p2); UMULH(s_hi, x_hi, p2); s_lo = c_lo + p1; carry += (s_lo < p1); c_hi += carry; carry = (c_hi < carry); coef++; s_hi = c_hi + p2; carry += (s_hi < p2); if (carry) { s_lo = (s_lo >> 1) | (s_hi << 63); s_hi = (s_hi >> 1) | SET_BIT(63); shift = 1; exponent++; if (cnt >= 0) goto p_shift_1_to_63; } } P_UX_LSD(p, s_lo); P_UX_MSD(p, s_hi); P_UX_EXPONENT(p, exponent); P_UX_SIGN(p, 0); } static void __eval_neg_poly(UX_FLOAT * x, WORD shift, FIXED_128 * coef, WORD cnt, UX_FLOAT * p) { UX_FRACTION_DIGIT_TYPE c_hi, c_lo, s_hi, s_lo, p1, p2, tmp; UX_FRACTION_DIGIT_TYPE x_hi, x_lo; WORD shift_inc; x_hi = G_UX_MSD(x); x_lo = G_UX_LSD(x); shift_inc = G_UX_EXPONENT(x); s_lo = s_hi = 0; if (shift < 128) goto n_check_shift_64_to_127; /* Skip over all the big shifts */ n_shift_ge_128: shift += shift_inc; coef++; cnt--; if (shift >= 128) goto n_shift_ge_128; /* * Each time through this loop, c_hi = 0. Since we assume that c(k) > * x*s(k+1), s(k) = c(k) - x*s(k*1) < c(k). Consequently, there is * no borrow from the computation of s(k) into it high 64 bits. * That means each time we enter the top of the loop, both c_hi and * s_hi = 0; */ n_check_shift_64_to_127: if (shift < 64) goto n_check_shift_1_to_63; /* * Depending on the size of shift_inc and the rate at which the * coefficients decrease, several of the next Horner's scheme iterations * will yield zero results, so there is no need to do the multiply. * Since multiplies are likely to be expensive, we check for this case * and skip over them. */ if (s_lo) goto n_shift_64_to_127; n_shift_64_to_127_zero_loop: s_lo = coef->digits[1] >> (shift - 64); shift += shift_inc; coef++; cnt--; if (shift < 64) goto n_check_shift_1_to_63; if (s_lo == 0) goto n_shift_64_to_127_zero_loop; /* * s_lo is no longer zero, so do the multiply and accumulate the * products. */ n_shift_64_to_127: UMULH(s_lo, x_hi, p1); c_lo = coef->digits[1] >> (shift - 64); shift += shift_inc; coef++; cnt--; s_lo = c_lo - p1; if (shift >= 64) goto n_shift_64_to_127; /* * When shift = 0, the complementary shift is 64. ANSI C does not * specify the result of a shift by 64, so we need to handle this as * a special case. */ n_check_shift_1_to_63: if (shift == 0) goto n_shift_eq_0; /* * Depending on the size of shift_inc and the rate at which the * coefficients decrease, several of the next Horner's scheme iterations * will yield zero results for s_hi, so there is no need to do the * multiplies associated with s_hi. Since multiplies are likely to be * expensive, we check for this case and skip over them. */ if (s_hi) goto n_shift_1_to_63; n_shift_1_to_63_zero_loop: UMULH(s_lo, x_hi, p1); c_hi = coef->digits[1]; c_lo = coef->digits[0]; c_lo = (c_lo >> shift) | (c_hi << (64 - shift)); s_hi = (c_hi >> shift); shift += shift_inc; coef++; cnt--; s_lo = c_lo - p1; s_hi -= (s_lo > c_lo); if (shift == 0) goto n_shift_eq_0; if (s_hi == 0) goto n_shift_1_to_63_zero_loop; n_shift_1_to_63: p1 = s_hi*x_hi; c_hi = coef->digits[1]; c_lo = coef->digits[0]; c_lo = (c_lo >> shift) | (c_hi << (64 - shift)); c_hi >>= shift; UMULH(s_hi, x_lo, p2); tmp = c_lo - p1; c_hi -= (tmp > c_lo); cnt--; UMULH(s_lo, x_hi, p1); c_lo = tmp - p2; c_hi -= (c_lo > tmp); shift += shift_inc; UMULH(s_hi, x_hi, p2); s_lo = c_lo - p1; c_hi -= (s_lo > c_lo); coef++; s_hi = c_hi - p2; if (shift) goto n_shift_1_to_63; n_shift_eq_0: while (cnt >= 0) { p1 = s_hi*x_hi; c_hi = coef->digits[1]; c_lo = coef->digits[0]; UMULH(s_hi, x_lo, p2); tmp = c_lo - p1; c_hi -= (tmp > c_lo); cnt--; UMULH(s_lo, x_hi, p1); c_lo = tmp - p2; c_hi -= (c_lo > tmp); UMULH(s_hi, x_hi, p2); s_lo = c_lo - p1; c_hi -= (s_lo > c_lo); coef++; s_hi = c_hi - p2; } P_UX_LSD(p, s_lo); P_UX_MSD(p, s_hi); P_UX_EXPONENT(p, 0); P_UX_SIGN(p, 0); } /* ** EVALUATE_RATIONAL is a driver routine for the two polynomial evaluation ** routines. Even though it is architecture and word size independent, it ** is included in this file to increase "locality". ** ** EVALUATE_RATIONAL generally computes a rational approximation, however, ** by specifying the appropriate set of flags, one, or two polynomial ** evaluation can be performed. ** ** The following flags are used to independently control the "form" of the ** numerator and denominator polynomials: ** ** SQUARE_TERM ** ALTERNATE_SIGN ** POST_MULTIPLY ** STANDARD ** ** The following flags control whether or not a rational approximation is ** performed and what form it has: ** ** SWAP ** SKIP ** NO_DIVIDE ** ** If the SKIP flag is specified in conjunction with the flags for either ** the numerator or denominator being zero, only one part of a rational ** will be evaluated. */ #define EITHER(n) (DENOMINATOR_FLAGS(n) | NUMERATOR_FLAGS(n)) #define NUMERATOR_MASK NUMERATOR_FLAGS(MAKE_MASK(NUM_DEN_FIELD_WIDTH, 0)) #define DENOMINATOR_MASK DENOMINATOR_FLAGS(MAKE_MASK(NUM_DEN_FIELD_WIDTH, 0)) #define UPDATE_COEF_PTR(c,d) (c) = ((FIXED_128 *)((char *) (c) + (d))) #define G_EXPONENT(c) ((UX_EXPONENT_TYPE) ((WORD *) (c))[-1]) void EVALUATE_RATIONAL( UX_FLOAT * argument, FIXED_128 * coefficients, U_WORD degree, U_WORD flags, UX_FLOAT * result) { WORD tmp; WORD sign, shift, byte_length, poly_shift; UX_EXPONENT_TYPE exponent; UX_FLOAT * first_result, *second_result, arg_squared, *poly_arg; void (* poly_func)(UX_FLOAT *, WORD, FIXED_128 *, WORD, UX_FLOAT *); /* Scale argument and squared it if its needed */ sign = flags; UX_INCR_EXPONENT(argument, G_SCALE(flags)); if (flags & EITHER(SQUARE_TERM)) { poly_arg = &arg_squared; MULTIPLY(argument, argument, &arg_squared); } else { poly_arg = argument; tmp = G_UX_SIGN(argument) ? EITHER(ALTERNATE_SIGN) : 0; sign = flags ^ tmp; } /* Start calculation of shift parameter. */ NORMALIZE(poly_arg); exponent = G_UX_EXPONENT(poly_arg); P_UX_EXPONENT(poly_arg, exponent); shift = -degree*exponent; byte_length = (degree + 1)*sizeof(FIXED_128) + sizeof(WORD); /* allocate locations for 1st and 2nd result */ tmp = (((flags & SWAP) == 0) || (flags & SKIP)) ? 0 : 1; first_result = result + tmp; second_result = result + 1 - tmp; if (NUMERATOR_MASK & flags) { //printf("NUMERATOR_MASK !!\n"); poly_func = (ALTERNATE_SIGN & sign) ? __eval_neg_poly : __eval_pos_poly; first_result = (DENOMINATOR_MASK & flags) ? first_result : result; poly_func( poly_arg, shift, coefficients, degree, first_result); //printf("f_result= (%x %x) %llx %llx\n",first_result->sign,first_result->exponent,first_result->fraction[0],first_result->fraction[1]); //printf("fl & NUMERATOR_FLAGS(POST_MULTIPLY) = %llx (%llx)\n", flags & NUMERATOR_FLAGS(POST_MULTIPLY), flags); if (flags & NUMERATOR_FLAGS(POST_MULTIPLY)) MULTIPLY(argument, first_result, first_result); //printf("result..= (%x %x) %llx %llx\n",result->sign,result->exponent,result->fraction[0],result->fraction[1]); UPDATE_COEF_PTR(coefficients, byte_length); UX_INCR_EXPONENT(first_result, G_EXPONENT(coefficients)); } else { second_result = result; flags |= NO_DIVIDE; if ( flags & SKIP ) UPDATE_COEF_PTR(coefficients, byte_length); } if (DENOMINATOR_MASK & flags) { //printf("DENOMINATOR_MASK !!\n"); poly_func = ( DENOMINATOR_FLAGS(ALTERNATE_SIGN) & sign ) ? __eval_neg_poly : __eval_pos_poly; poly_func( poly_arg, shift, coefficients, degree, second_result); if (flags & DENOMINATOR_FLAGS(POST_MULTIPLY)) MULTIPLY(argument, second_result, second_result); UPDATE_COEF_PTR(coefficients, byte_length); UX_INCR_EXPONENT(second_result, G_EXPONENT(coefficients)); if ( flags & SKIP ) /* Numerator was skipped, we're done */ return; } else { flags |= NO_DIVIDE; if ( flags & SKIP ) UPDATE_COEF_PTR(coefficients, byte_length); } //printf("fl & NO_DIV = %llx\n", flags & NO_DIVIDE); //printf("result0= (%x %x) %llx %llx\n",result->sign,result->exponent,result->fraction[0],result->fraction[1]); if ((flags & NO_DIVIDE) == 0) DIVIDE(result, result + 1, FULL_PRECISION, result); } #if 0 U_INT_64 __umulh( U_INT_64 i, U_INT_64 j ) { U_INT_64 k; { U_INT_64 iLo, iHi, jLo, jHi, p0, p1, p2; iLo = __LO(i); iHi = __HI(i); jLo = __LO(j); jHi = __HI(j); p0 = iLo * jLo; p1 = (iLo * jHi); p2 = (iHi * jLo) + __HI(p0) + __LO(p1);\ k = (iHi * jHi) + __HI(p1) + __HI(p2); } return k; } #endif LIBRARY/float128/dpml_pow_cons.c0000644€­ Q00042560000020421714616534611016447 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #define ENDIF foo = 1; /* Explain this */ /* File: dpml_pow_cons.c */ /* ** Facility: ** ** DPML ** ** Abstract: ** ** This file is used to generate common include files for the ** DPML functions that are related to the exp function. Currently ** the generated file is shared by: ** ** o exp (fast and accurate) ** o pow (fast and accurate) ** o expm1 ** o sinh and cosh ** ** Where appropriate, this file also contains brief description of the ** algorithms used in the above functions. ** ** Modification History: ** ** 1-001 Initial implementation. Martha Jaffe 27-May-1994. ** ** 2-001 Initial implementation. RNH 01-Feb-95 ** 2-002 Added hi-limit check const for exp2. MJ 10-Dec-98 ** 2-003 Added 'rm TMP_FILE'. RNH 04-Sep-2002 */ /* ** SUMMARY OF BUILD INFORMATION ** ---------------------------- ** ** Since the total size of the constants and tables required to build the power ** routines is large, by default we assume that the constants will be shared ** whenever possible between data types and functions. Switches are provided ** to over-ride the default sharing behavior. ** ** Also, there is a switch to determine if the argument reduction scheme for ** the accurate power routine uses a divide operation or not. The default is ** to not use divide. ** ** The following table summerizes the supportted switches ** ** Switch Meaning ** ----------- ------------------------------------------------- ** NO_FAST Don't generate values for the fast routines. ** ** NO_ACC Don't generate values for the accurate routines. ** ** ONE_TYPE Only generate values for the specified type ** ** USE_DIVIDE Generate constant necessary for doing the log argument ** reduction using division ** ** The defualt values of the above switches are a function of data type: ** ** Default ** --------------------- ** Switch Single Double Quad ** ----------- --------------------- ** NO_FAST False False True ** NO_ACC False False False ** ONE_TYPE False False True ** USE_DIVIDE False False True ** ** ** NOTE: when sharing the generated table between type, ** the larger precision type must be specified when ** processing this file. ** ** In addition to the above build flags, users can also specify the size ** (actually, the log2 of the size) of the exp and log tables by defining ** POW2_K and LOG2_K respectively. The default values are POW2_K = 8 and ** LOG2_K = 7. The implications of changing these values is discussed ** below. (Look for the string "DEFINING THE TABLE SIZES"); */ #if defined X_FLOAT # define _X_FLT_DEF 1 #else # define _X_FLT_DEF 0 #endif #if defined(NO_FAST) # undef NO_FAST # define NO_FAST 1 #else # define NO_FAST _X_FLT_DEF #endif #if defined(NO_ACC) # undef NO_ACC # define NO_ACC 1 #else # define NO_ACC 0 #endif #if defined(ONE_TYPE) # undef ONE_TYPE # define ONE_TYPE 1 #else # define ONE_TYPE _X_FLT_DEF #endif #if defined(USE_DIVIDE) # undef USE_DIVIDE # define USE_DIVIDE 1 #else # define USE_DIVIDE _X_FLT_DEF #endif #if NO_FAST && NO_ACC # error "ERROR: Can't define both NO_FAST and NO_ACC" #endif #if USE_DIVIDE && NO_ACC # error "ERROR: USE_DIVIDE only valid for accurate pow" #endif /* * MAKE_INCLUDE and MAKE_COMMON are always defined for this file. */ #undef MAKE_INCLUDE #define MAKE_INCLUDE #undef MAKE_COMMON #define MAKE_COMMON /* * Pick up default names */ #define __POW_BASE_NAME POW_BASE_NAME #ifndef BASE_NAME # define BASE_NAME __POW_BASE_NAME #endif #if defined(MAKE_COMMON) # define POW_TABLE_NAME F_POW_TABLE_NAME # define _BUILD_FILE_NAME F_POW_BUILD_FILE_NAME #else # define POW_TABLE_NAME __F_TABLE_NAME(POW_TABLE_BASE_NAME) # define _BUILD_FILE_NAME __BUILD_FILE_NAME(POW_TABLE_BASE_NAME) #endif #if !defined(BUILD_FILE_NAME) # define BUILD_FILE_NAME _F_POW_BUILD_FILE_NAME #endif #if !defined(TABLE_NAME) # define TABLE_NAME POW_TABLE_NAME #endif /* * Get default setting for table sizes */ #if !defined(LOG2_K) # define LOG2_K 7 #endif #if !defined(POW2_K) # define POW2_K 8 #endif /* ** Set types for default print macros. Also set flag to pickup latest ** version of the mphoc macros. */ #define MP_T_TYPE B_TYPE #define MP_T_CHAR B_CHAR #define MP_T_PRECISION B_PRECISION #define NEW_DPML_MACROS 1 #include "dpml_private.h" #include "dpml_pow.h" #if !ONE_TYPE && (R_PRECISION + R_EXP_WIDTH + POW2_K - 1 > F_PRECISION) # error "ERROR: Floating types incompatible for shared tables" #endif /* ** ORGANIZATION OF THE GENERATED FILE ** ---------------------------------- ** ** The size of the table in generated file is quite large, and for the default ** values, the single/double precision table is greater than 8k in size. In ** order to help eliminate cache misses and ease finding problems with this ** code and values in the tables, the table is laid out as follows: ** ** +---------------------------------------+ ** | | ** | | ** | table of 2^(j/2^POW2_K) values | ** | | ** | | ** +---------------------------------------+ ** | Constants for fast exp | ** +---------------------------------------+ ** | Constants for 2^x portion of fast pow | ** +---------------------------------------+ ** | Constants for 2^x portion of acc pow | ** +---------------------------------------+ ** | Constants for acc exp | ** +---------------------------------------+ ** | Constants for expm1 | ** +---------------------------------------+ ** | Constants for sinh/cosh | ** +---------------------------------------+ ** | Miscellaneous shared Constants | ** +---------------------------------------+ ** | Constants for log2 portion of pow | ** +---------------------------------------+ ** | | ** | | ** | table of log(1 + j/2^LOG2_K) values | ** | | ** | | ** +---------------------------------------+ ** */ @divert divertText /* ** GENERATING POLYNOMIAL COEFFICIENTS: ** ----------------------------------- ** ** All of the polynomial coefficients in this file are generated via the ** Remes min/max error algorithm. This algorithm takes as one of its input ** arguments, the function to be approximated, F(x). For example, if we ** look at generating the exp and pow polynomials, F(x) can be one of e^x, ** (e^x - 1)/x, [e^x - (1 + x)]/x^2, 2^x, or (2^x - 1)/x. ** ** In order to minimize the number of different functions defined for remes ** algorithm, we define F(x) as a polynomial evaluation routine, with an ** external (global) scale factor and initial term. This not only reduces ** the number of functions that need to be defined, but also reduces the ** required MP precision in the calculation of the coefficients, since, ** the cancellation error in computations like e^x - 1 and log(x) - ** (x - x^2/2) have been eliminated. ** ** Also, in order to insure the polynomial evaluation macro matches the ** coefficients, the invocation of genpoly that generates the evaluation ** macros is encoded as a macro definition at the time the coefficients ** are generated. The macro is instantiated after the constant table is ** generated. ** ** Lastly, each set of coefficients is generated into the array 'coefs', so ** that it can be printed via a subroutine. This requires that the ** coefficients are printed immediately after they are generated. **/ # define SET_POLY_GLOBALS(k, s, xs, fs) \ first_term = (k); \ first_term_value = (s); \ x_scale = (xs); \ final_scale = (fs) # define PRINT_TBL_COM_ADEF_ARRAY(com, def, deg) \ PRINT_TBL_COM_ADEF(com, def); \ print_array(deg) procedure print_array(n) { for (i = 0; i <= n; i++) { PRINT_TBL_ITEM(coefs[i]); } } # define WORKING_PRECISION (ceil(2*B_PRECISION/MP_RADIX_BITS) + 2) precision = WORKING_PRECISION; bit_precision = MP_RADIX_BITS*precision; /* ** Pick up definitions of common MP functions and print out the ** initial boiler plate for the generated file. As part of the boiler ** plate, record the current definitions of the macro TABLE_NAME. ** Once that has been done, undefine TABLE_NAME so that we can define ** items in the generated file relative to the symbolic value TABLE_NAME ** rather than the actual value of TABLE_NAME */ # include "mphoc_functions.h" printf( "\n" "/* Define default table name */\n" "\n" "#if !defined(TABLE_NAME)\n" "# define TABLE_NAME\t" STR(TABLE_NAME) "\n" "#endif\n" "\n" "#include \"dpml_private.h\"\n" "\n"); # undef TABLE_NAME printf("\n#if !DEFINE_SYMBOLIC_CONSTANTS\n\n"); START_TABLE; /* ** ** GENERAL DISCUSSION OF 2^x, e^x and 10^x ** --------------------------------------- ** ** The computation of b^x for b = 2, e and 10 is based on a table look-up ** scheme, where the number of entries in the table is a power of 2, ** say 2^k. Writing x*(lnb/ln2) as the sum of its integer, first k fraction ** bits and a reduced arguement we have: ** ** x(lnb/ln2) = I + j/2^k + w, |w| < 2^(k+1) ** ** Letting z = w*(ln2/lnb) = x - (I + j/2^k)*(ln2/lnb), the computation of ** e^x proceeds as: ** ** b^x = 2^(x(lnb/ln2)) ** = 2^(I + j/2^k + w) ** = 2^I * 2^(j/2^k) * 2^w ** = 2^I * 2^(j/2^k) * e^z ** = 2^I * 2^(j/2^k) * [ 1 + z*p(z) ] (1) ** ** In (1), the alignment shift between 1 and z*p(z) is at least k+1 bits, ** so if care is taken in computing 2^I*2^(j/2^k) high accuracy in the ** final answer is possible. Toward this end, we suppose the values of ** 2^(j/2^k) are stored in a table in hi and lo pieces, T(j) and L(j). ** Then (1) can be re-written as: ** ** b^x = 2^I * 2^(j/2^k) * [ 1 + z*p(z) ] ** = 2^I * [ T(j) + L(j) ] * [ 1 + z*p(z) ] ** = 2^I * { T(j) + L(j) + [ T(j) + L(j) ]*z*p(z) } ** ** There are various way to define T(j) and L(j) so that "extra" ** precision is obtained. The definition we use here was chosen to ** optimize the performance of the fast exp and pow routines. In ** particular: ** ** T(j) = bround( 2^(j/2^k), F_PRECISION) ** L(j) = 2^(j/2^k) - T(j) ** ** With this definition, the term L(j)*z*p(z) is insignificant in the ** final sum and may be dropped, so that e^x can be approximated by: ** ** b^x = 2^I * { T(j) + [ L(j) + T(j)*z*p(z) ] } (2) ** ** In order to expose more parallelism in the computation, rather than ** storing the values of T(j) and L(j) in the tables, we store T(j) and ** R(j) = L(j)/T(j) and write (2) as: ** ** b^x = 2^I * { T(j) + [ L(j) + T(j)*z*p(z) ] } ** = 2^I * { T(j) + T(j)* [ R(j) + z*p(z) ] } ** = 2^I * T(j) + 2^I*T(j)* [ R(j) + z*p(z) ] ** = V(I,j) + V(I, j)* [ R(j) + z*p(z) ] (3) ** ** where V(I,j) = 2^I * T(j). Note that on pipelined architectures, ** R(j) + z*p(z) can be computed with the same latancy as z*p(z) and ** on architectures with multiple functional units V(I,j) can be computed ** in the integer unit while R(j) + z*p(z) is computed in the floating ** point unit. */ /* ** POW2 TABLE ** ---------- ** ** The pow2 table contains the 2^POW2_K th roots of 2, 2^(j/2^POW2_K). ** The table has a different form depending on whether backup precision ** is available or not. ** ** When back up precision is not available, the table contain the values ** T(j) and R(j) as defined above. When backup precision is available, ** only T(j) is stored. */ # define __PRINT_TABLE_VALUE(tchar, value) \ printf( "\t/* %4i */ %#.4" STR(tchar) ",\n", \ BYTES(MP_BIT_OFFSET), value); \ MP_BIT_OFFSET += CHAR_TO_BITS(tchar) # define __PRINT_TABLE_DEF(name, tchar, disp) \ printf("#define " name "\t*((" STR(CHAR_TO_TYPE(tchar)) \ " *) ((char *) " STR(MP_TABLE_NAME) \ " + %i + (j)))\n", BYTES(disp)); \ disp += CHAR_TO_BITS(tchar) # if (USE_BACKUP) # define POW2_TABLE_BANNER \ "\n\t * Tj = 2^(j/2^POW2_K)" \ "\n\t *" \ "\n\t * offset row" \ "\n\t" # define PRINT_POW2_TABLE_ACCESS_MACROS(disp) \ PRINT_LOG_TABLE_DEF("GET_POW2(j)\t", B_CHAR, disp) # define POW2_INDEX_POS (__LOG2(BITS_PER_B_TYPE) - 3) # define PRINT_POW2_TABLE_ENTRY(j, Pj) \ printf( "\t/* %4i */ %#.4" STR(B_CHAR), ", /* %3i */", \ BYTES(MP_BIT_OFFSET), Pj, j); \ MP_BIT_OFFSET += BITS_PER_B_TYPE # else /* USE_BACKUP */ # define POW2_TABLE_BANNER \ "\n\t * Tj = 2^(j/2^POW2_K) and Rj = [2^(j/2^POW2_K) - Tj]/Tj." \ "\n\t *" \ "\n\t * offset row" \ "\n\t" # define PRINT_POW2_TABLE_ACCESS_MACROS(disp) \ __PRINT_TABLE_DEF("POW2_HI(j)\t", F_CHAR, disp); \ __PRINT_TABLE_DEF("POW2_LO_OV_POW2_HI(j)", F_CHAR, disp) # define POW2_INDEX_POS (__LOG2(BITS_PER_F_TYPE) - 2) # define PRINT_POW2_TABLE_ENTRY(j, Pj) \ Pj_hi = bround(Pj, F_PRECISION); \ printf("\t/* %4i */ %#.4" STR(F_CHAR) ", /* %3i */\n", \ BYTES(MP_BIT_OFFSET), Pj, j); \ MP_BIT_OFFSET += BITS_PER_F_TYPE; \ __PRINT_TABLE_VALUE(F_CHAR, (Pj - Pj_hi)/Pj) #endif disp = MP_BIT_OFFSET; root_disp = disp; PRINT_POW2_TABLE_ACCESS_MACROS(disp); /* ** As noted above, the quantity V(I,j) = 2^I*T(j) is computed in an ** integer register. The follow code prints out definitions for accessing ** T(j) an integer. If the word size is smaller that the F_TYPE size, we ** need to access it in two pieces. Make sure to take into account ** "endianess" */ if (BITS_PER_WORD < BITS_PER_F_TYPE) { disp_lo = root_disp; if ((VAX_FLOATING) || (ENDIANESS == big_endian)) disp_lo = root_disp + (BITS_PER_F_TYPE - BITS_PER_WORD); else root_disp += (BITS_PER_F_TYPE - BITS_PER_WORD); /* ** If the word size is verfy small relative to the floating point ** type, get the low order bits in a F_UNION by loading the whole ** floating point type. Otherwise, just load the low word */ if (BITS_PER_WORD*2 < BITS_PER_F_TYPE) { printf("#define IPOW2_LO(u,j)\t\tu.f = " "*((B_TYPE *) ((char *) " STR(MP_TABLE_NAME) " + (j)))\n"); } else { printf("#define IPOW2_LO(u,j)\t\tu.B_LO_WORD = " "*((WORD *) ((char *) " STR(MP_TABLE_NAME) " + %i + (j)))\n", BYTES(disp_lo)); } } __PRINT_TABLE_DEF("IPOW2(j)\t", w, root_disp); printf("#define POW2_INDEX_POS\t\t%i \n", POW2_INDEX_POS); TABLE_COMMENT( POW2_TABLE_BANNER ); pow2_table_size = 2^POW2_K; for (j = 0; j < pow2_table_size; j++) { Pj = 2^(j/pow2_table_size); PRINT_POW2_TABLE_ENTRY( j, Pj); } /* ** Error Checking: ** --------------- ** ** b^x can both underflow and overflow. Consequently some type of error ** check (screening) must eventually take place. Since the appropriate ** timing and nature of the screening varies from function to function, it ** is discussed with the individual functions. ** ** That said, all of the function using the pow2 table, have a "final" ** underflow/overflow check near the very end of the routine. The check ** is based on the fact that the computation of V(I,j) is done in an ** integer register and provides a very good approximation to the final ** answer. We can use integer comparisons on the bit pattern for V(I,j) ** to eliminate all potential overflows and underflows just prior to or ** just after the last floating point operation(s). */ c = 2^(1/pow2_table_size); lo = F_HI_BITS_RND(2^(F_MIN_BIN_EXP + F_NORM + F_PRECISION + POW2_K)*c, MP_RP); hi = F_HI_BITS_RND(2^(F_MAX_BIN_EXP + F_NORM + 1)/c, MP_RM); PRINT_U_TBL_COM_VDEF_ITEM("F_PRECISION acc pow2 result range check", "POW2_LO_CHECK_F\t", lo); PRINT_U_TBL_VDEF_ITEM("POW2_HI_CHECK_F\t", hi - lo); PRINT_U_TBL_VDEF_ITEM("POW2_MAX_SCALE_F\t", hi); if (!ONE_TYPE) { lo = F_HI_BITS_RND(2^(R_MIN_BIN_EXP + R_NORM + R_PRECISION + POW2_K)*c, MP_RP); hi = F_HI_BITS_RND(2^(R_MAX_BIN_EXP + R_NORM + 1)/c, MP_RM); PRINT_U_TBL_COM_VDEF_ITEM("R_PRECISION acc pow2 result range check", "POW2_LO_CHECK_R\t", lo); PRINT_U_TBL_VDEF_ITEM("POW2_HI_CHECK_R\t", hi - lo); PRINT_U_TBL_VDEF_ITEM("POW2_MAX_SCALE_R\t", hi); } ENDIF /* ** Computation of I, j and w: ** -------------------------- ** ** From the above discussion, we see that at some point in the evaluation ** of b^x, we need to take a floating point value and break it into its ** integer part, high fraction bits and low fraction bits. If z is the ** value we want to break apart, then the conceptual computation that is ** performed is: ** ** t <-- rint(2^k*z) ** w = z - t/2^k ** m <-- (WORD) t ** i <-- m >> k ** j <-- m & (2^k - 1) ** ** In actuality, the first three steps of the above is performed by taking ** z, adding and then subtracting a large positive constant, BIG. BIG is ** chosen so that the low order fraction bits of z are discarded due to ** the alignment shift leaving only the integer and high fraction bits. ** Specifically: ** ** BIG <-- 3*2^(B_PRECISION - k - 2) ** u <-- BIG + z ** fm <-- u - BIG ** ** Note that if B_PRECISION > 32 and the rounding mode is round to nearest, ** then the low order 32 bits of t are the twos complement representation ** m and fm = u/2^k. ** ** ** Polynomial Generation For 2^x, e^x and 10^x: ** -------------------------------------------- ** ** The coefficients for 2^x are based on the Taylor series expansion ** for e^x: ** ** e^x = 1 + x + x^2/2! + x^3/3! + .... ** ** with the variable x replaced by x = z * ln2: ** ** 2^z = 1 + ln2*z + z^2*(ln2)^2/2! + z^3*(ln2)^3/3! + .... ** = 1 + z*(ln2 + z*(ln2)^2/2! + z^2*(ln2)^3/3! + ....) ** = 1 + z*P(z) ** ** In both cases, the size of the argument being evaluated is dictated ** by k. */ ln2 = log(2.0); recip_ln2 = 1/ln2; ln2_ov_ln10 = ln2/log(10.); ln10_ov_ln2 = log(10.0)/ln2; max_exp_x = .5/pow2_table_size; max_pow2_x = max_exp_x*ln2; /* ** The following function is used by the Remes algorithm to generate ** min/max coefficients for e^x and 2^x. We can approximate e^x, e^x - 1 ** and e^x - (1 + x) by specifying the (first_term, first_value) parameters ** as (0,1), (1, 1) and (2, .5) respectively. By changing the x_scale and ** last scale values from 1 to appropiate powers of ln2, we can similarly ** evaluate 2^x, 2^x - 1 and 2^x - (1 + x*ln2) ** */ function e_to_x_poly(x) { auto s, z, k, t; s = first_term_value; if (x != 0) { k = first_term; z = x*x_scale; t = first_term_value; while(1) { k++; t = (t*z)/k; if ((bexp(s) - bexp(t)) > bit_precision) break; s += t; } } ENDIF return s*final_scale; } /* ** All of the Remes invocations for exp/pow2 coeffient generations have ** the same form, so we make the corresponding code a macro. */ # define GEN_EXP_COEFS(max_x, prec, deg, com, tag) \ { \ remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + \ REMES_LINEAR_ARG, -max_x, max_x, e_to_x_poly, prec, \ °, &coefs); \ PRINT_TBL_COM_ADEF_ARRAY(com, tag, deg); \ } /* ** CONSTANTS FOR FAST EXP ** ---------------------- ** ** In fast exp, we use the identity e^x = 2^(x/ln2). Since we would like ** to delay the screening for overflow and underflow for as long as ** possible (to increase parallelism) and since x/ln2 might overflow, ** we perform the initial calculation as: ** ** w <-- x*[ 1/(2^n*ln2) ] ** t <-- BIG/2^n + w ** fm <-- t - BIG/2^n ** z <-- w - fm ** ** This produces a reduced argument, z, "scaled down" by 2^n. We can ** compensate for the scale factor in z by adjusting the coefficients ** in the polynomial evaluation. ** ** Note that if backup precision is not available, the compuation of ** z is more complicated that inidicate. Specificly, we must compute ** w = x*[ 1/(2^n*ln2) ] to extra precision by break x and 1/(2^n*ln2) ** into high and low pieces. ** ** Other than requiring that n >= 1, the exact choice of n in the above ** discussion is arbitrary. We choose n = F_EXP_WIDTH because, we can ** then share the constants with the fast pow routine. (See below) */ scale_down = 2^-F_EXP_WIDTH; fast_big = 3*2^(B_PRECISION - POW2_K - 2 - F_EXP_WIDTH); printf("#define SCALE_DOWN_EXP\t%i \n", F_EXP_WIDTH); if (!NO_FAST) { PRINT_TBL_COM_VDEF_ITEM("'big' for fast pow/exp rint computation", "FAST_BIG\t", fast_big); c = scale_down*recip_ln2; if (ONE_TYPE) { PRINT_TBL_COM_VDEF_ITEM("2^-F_EXP_WIDTH/ln2", "SCALE_DOWN_OVER_LN2\t", c); } else { TABLE_COMMENT("2^-F_EXP_WIDTH/log(2) in full, hi, lo"); c_hi = bround(c, F_PRECISION - F_HI_HALF_PRECISION - 2*LOG2_K + 1); PRINT_TBL_VDEF_ITEM("SCALE_DOWN_OV_LN2", c); PRINT_TBL_VDEF_ITEM("SCALE_DOWN_OV_LN2_HI", c_hi); PRINT_TBL_VDEF_ITEM("SCALE_DOWN_OV_LN2_LO", c - c_hi); } /* ** For fast exp, we delay screening for overflow and underflow ** until just before the polynomial evaluation. At that point ** we have obtained the high bits of the input argument as an ** integer and can perform the screening with integer operations. */ c = ln2*max(-(F_MIN_BIN_EXP + F_NORM), F_MAX_BIN_EXP + 1 + F_NORM); PRINT_U_TBL_COM_VDEF_ITEM("Fast exp F_PRECISION arg range check", "FAST_EXP_RANGE_CHECK_F", F_HI_BITS_RND(c, MP_RP)); if (!ONE_TYPE) { c = ln2*max(-(R_MIN_BIN_EXP + R_NORM), R_MAX_BIN_EXP + 1 + R_NORM); PRINT_U_TBL_COM_VDEF_ITEM("Fast exp R_PRECISION arg range check", "FAST_EXP_RANGE_CHECK_R", F_HI_BITS_RND(c, MP_RP)); } ENDIF /* ** As noted above, the fast pow and exp routines scale there input ** argument down to avoid premature overflow and we need to ** compensated for it in the polynomial coefficients. ** ** The actual form of the polynomial evaluated depends on whether ** or not backup precision is available. If it is, we use a polynomial ** for 2^x otherwise we use one for 2^x - 1 */ if (USE_BACKUP) { SET_POLY_GLOBALS(0, 1, ln2, 1); GEN_EXP_COEFS(max_pow2_x, F_PRECISION + 1, fast_pow2_deg_f, "F_PRECISION fast pow2 poly coeffs", "FAST_POW2_F\t") GENPOLY(FAST_POW2_F[%%d], FAST_POW2_POLY_F(x), fast_pow2_deg_f); } else { max_arg = max_pow2_x*scale_down; c = ln2/scale_down; SET_POLY_GLOBALS(0, 1, c, 1); GEN_EXP_COEFS(max_arg, F_PRECISION + 1, fast_pow2_deg_f, "F_PRECISION fast pow2 poly coeffs", "FAST_POW2_F\t") GENPOLY(FAST_POW2_F[%%d], FAST_POW2_POLY_F(x), fast_pow2_deg_f); if (!ONE_TYPE) { SET_POLY_GLOBALS(0, 1, ln2, 1); GEN_EXP_COEFS(max_pow2_x, R_PRECISION + 1, fast_pow2_deg_r, "R_PRECISION fast pow2 poly coeffs", "FAST_POW2_R\t") GENPOLY(FAST_POW2_R[%%d], FAST_POW2_POLY_R(x), fast_pow2_deg_r); } ENDIF } } ENDIF /* ** CONSTANTS FOR 2^x EVALUATION IN FAST POW ** ---------------------------------------- ** ** In fast pow, we use the identity x^y = 2^(y*log2(x)). As in fast exp, ** we would like to delay the screening for overflow and underflow for as ** long as possible but we need to avoid overflow when computing the ** product y*log2(x). To do this, we scale y down by an appropriate ** power of 2 prior to performing the multiplication. Since ** ** 2^(F_MIN_BIN_EXP - F_PRECISION + 1) <= x < 2^F_MAX_BIN_EXP ** ** It follows that ** ** (F_MIN_BIN_EXP - F_PRECISION + 1)*ln2 <= log2(x) < F_MAX_BIN_EXP*ln2 ** ** On the platforms currently supportted: ** ** 2^F_EXP_WIDTH > | F_MIN_BIN_EXP-F_PRECISION+1 | >= | F_MAX_BIN_EXP | ** ** So that log2(x) < 2^F_EXP_WIDTH. Therefore, the product ** (y * 2^-F_EXP_WIDTH)*log2(x) is guarenteed not to overflow. Note that ** (y * 2^-F_EXP_WIDTH) might underflow. But in this case the correct ** result of x^y is 1 to machine precision. So even if underflow occurs ** the correct result we be returned. ** ** For fast pow, we delay any overflow underflow checks until just before ** the evaluation of exponential polynomial. At that point we perform ** a gross level check on x and y to sceen out all guarenteed exceptions. ** Specifically we need to check for very large (positive or negative) ** y since these will cause guarenteed overflows or underflows. */ acc_big = 3*2^(B_PRECISION - POW2_K - 2); if (!NO_FAST) { PRINT_TBL_COM_VDEF_ITEM( "Power of 2 to scale down y: 2^-F_EXP_WIDTH", "SCALE_DOWN\t", scale_down); tmp = as_int(acc_big, 32, F_EXP_WIDTH, MP_F_EXP_BIAS, MP_RZ); printf("#define ACC_BIG_HI_32\t\t0x%8.8.16i \n", tmp + 1); tmp = as_int(fast_big, 32, F_EXP_WIDTH, MP_F_EXP_BIAS, MP_RZ); printf("#define FAST_BIG_HI_32\t\t0x%8.8.16i \n", tmp + 1); } ENDIF /* ** CONSTANTS FOR 2^x EVALUATION IN ACCURATE POW ** --------------------------------------------- ** ** In the accurate power routine, both x and y are screened prior to ** any computation, so it is unnecesary to scale y to avoid overflow, ** and consequently we don't need to compensate for the scale in the ** polynomial coefficients. Also, in order to minimize the number of ** operations performed, the argument reduction is performed as ** z = (x - fm*LN2_HI) - fm*LN2_LO, when backup precision is not ** available. */ if (!USE_BACKUP) { /* ln2_ are also used in the log2 part of pow */ c_hi = bround(ln2, R_PRECISION); PRINT_TBL_COM_VDEF_ITEM("ln2 in hi/lo", "LN2_HI\t\t", c_hi); PRINT_TBL_VDEF_ITEM("LN2_LO\t\t", ln2 - c_hi); c_hi = bround(ln2_ov_ln10, R_PRECISION); PRINT_TBL_COM_VDEF_ITEM("ln2/ln10 in hi/lo", "LN2_OV_LN10_HI\t\t", c_hi); PRINT_TBL_VDEF_ITEM("LN2_OV_LN10_LO\t\t", ln2_ov_ln10 - c_hi); } if (!NO_ACC) { if (USE_BACKUP) { /* Approximate 2^x to extra precision */ SET_POLY_GLOBALS(0, 1, ln2, 1); GEN_EXP_COEFS(max_pow2_x, F_PRECISION + POW2_K + 1, acc_pow2_deg_f, "F_PRECISION acc pow2 poly coeffs", "ACC_POW2_F\t") GENPOLY(ACC_POW2_F[%%d], ACC_POW2_POLY_F(x), acc_pow2_deg_f); } else { /* Approximate 2^x - 1 to base precision */ SET_POLY_GLOBALS(1, 1, ln2, ln2); GEN_EXP_COEFS(max_pow2_x, F_PRECISION + 1, acc_pow2_deg_f, "F_PRECISION acc pow2 poly coeffs", "ACC_POW2_F\t"); _GENPOLY(ACC_POW2_F[%%d], ACC_POW2_POLY_F(t,x), -1, c0=t, acc_pow2_deg_f + 1); if (!ONE_TYPE) { SET_POLY_GLOBALS(0, 1, ln2, 1); GEN_EXP_COEFS(max_pow2_x, R_PRECISION + POW2_K + 1, acc_pow2_deg_r, "R_PRECISION acc pow2 poly coeffs", "ACC_POW2_R\t") GENPOLY(ACC_POW2_R[%%d], ACC_POW2_POLY_R(x), acc_pow2_deg_r); } } } ENDIF /* ** CONSTANTS FOR ACCURATE EXP ** -------------------------- ** ** As with accurate power, accurate exp screens it argument prior to ** to any floating point calculation, so it is un-neccessary to scale ** the product x*(1/ln2). This means that the value of BIG and the ** polynomial coefficients also don't require any scaling */ if (!NO_ACC) { PRINT_TBL_COM_VDEF_ITEM("'big' for accurate pow/exp rint computation", "ACC_BIG\t\t", acc_big); /* ** For accurate exp, the initial screening weeds out large arguments ** (guarenteed overflow or underflow), NaNs and Infinities and very ** small arguements (for which the final result is 1.) */ if (IEEE_FLOATING) lo = (F_MIN_BIN_EXP + F_NORM - F_PRECISION)*ln2; else lo = (F_MIN_BIN_EXP + F_NORM)*ln2; hi = (F_MAX_BIN_EXP + F_NORM)*ln2 + log((2 - 2^-F_PRECISION)); lo_check = F_HI_BITS_RND(2^-(F_PRECISION + 1), MP_RM); hi_check = F_HI_BITS_RND(max(-lo, hi), MP_RP); TABLE_COMMENT("F_PRECISION argument and result sreening values"); PRINT_U_TBL_VDEF_ITEM("EXP_LO_CHECK_F\t", lo_check); PRINT_U_TBL_VDEF_ITEM("EXP_HI_CHECK_F\t", hi_check - lo_check); if (!ONE_TYPE) { if (IEEE_FLOATING) lo = (R_MIN_BIN_EXP - R_NORM - R_PRECISION)*ln2; else lo = (R_MIN_BIN_EXP - R_NORM)*ln2; hi = (R_MAX_BIN_EXP - R_NORM)*ln2 + log((2 - 2^-R_PRECISION)); lo_check = R_HI_BITS_RND(2^-(R_PRECISION + 1), MP_RM); hi_check = R_HI_BITS_RND(max(-lo, hi), MP_RP); TABLE_COMMENT( "R_PRECISION argument and result sreening values"); PRINT_U_TBL_VDEF_ITEM("EXP_LO_CHECK_R\t", lo_check); PRINT_U_TBL_VDEF_ITEM("EXP_HI_CHECK_R\t", hi_check - lo_check); } ENDIF /* ** Similarly, for 2^x, initial screening to weed out large arguments ** (guaranteed overflow or underflow), NaNs and Infinities. */ if (IEEE_FLOATING) lo = (F_MIN_BIN_EXP + F_NORM - F_PRECISION) ; else lo = (F_MIN_BIN_EXP + F_NORM); hi = (F_MAX_BIN_EXP + F_NORM) + log2((2 - 2^-F_PRECISION)); hi_check = F_HI_BITS_RND(max(-lo, hi), MP_RP); lo_check = F_HI_BITS_RND(2^-(F_PRECISION + 1), MP_RM); TABLE_COMMENT("F_PRECISION argument screening values for 2^x"); PRINT_U_TBL_VDEF_ITEM("EXP2_HI_CHECK_F\t", hi_check - lo_check); if (!ONE_TYPE) { if (IEEE_FLOATING) lo = (R_MIN_BIN_EXP - R_NORM - R_PRECISION); else lo = (R_MIN_BIN_EXP - R_NORM); hi = (R_MAX_BIN_EXP - R_NORM) + log2((2 - 2^-R_PRECISION)); hi_check = R_HI_BITS_RND(max(-lo, hi), MP_RP); lo_check = R_HI_BITS_RND(2^-(R_PRECISION + 1), MP_RM); TABLE_COMMENT( "R_PRECISION argument and result sreening values"); PRINT_U_TBL_VDEF_ITEM("EXP2_HI_CHECK_R\t",hi_check - lo_check); } ENDIF /* ** Once again for the 10^x case */ if (IEEE_FLOATING) lo = (F_MIN_BIN_EXP + F_NORM - F_PRECISION)*ln2_ov_ln10; else lo = (F_MIN_BIN_EXP + F_NORM)*ln2_ov_ln10; hi = (F_MAX_BIN_EXP + F_NORM)*ln2_ov_ln10 + log((2 - 2^-F_PRECISION)); lo_check = F_HI_BITS_RND(2^-(F_PRECISION + 1), MP_RM); hi_check = F_HI_BITS_RND(max(-lo, hi), MP_RP); TABLE_COMMENT("F_PRECISION argument and result sreening values for 10^x"); PRINT_U_TBL_VDEF_ITEM("EXP10_LO_CHECK_F\t", lo_check); PRINT_U_TBL_VDEF_ITEM("EXP10_HI_CHECK_F\t", hi_check - lo_check); if (!ONE_TYPE) { if (IEEE_FLOATING) lo = (R_MIN_BIN_EXP - R_NORM - R_PRECISION)*ln2_ov_ln10; else lo = (R_MIN_BIN_EXP - R_NORM)*ln2_ov_ln10; hi = (R_MAX_BIN_EXP - R_NORM)*ln2_ov_ln10 + log((2 - 2^-R_PRECISION)); lo_check = R_HI_BITS_RND(2^-(R_PRECISION + 1), MP_RM); hi_check = R_HI_BITS_RND(max(-lo, hi), MP_RP); TABLE_COMMENT( "R_PRECISION argument and result sreening values for 10^x"); PRINT_U_TBL_VDEF_ITEM("EXP10_LO_CHECK_R\t", lo_check); PRINT_U_TBL_VDEF_ITEM("EXP10_HI_CHECK_R\t", hi_check - lo_check); } ENDIF /* ** When backup precision is available, accurate exp uses a polynomial ** for 2^x otherwise it uses one for e^x. **/ if (USE_BACKUP) { SET_POLY_GLOBALS(0, 1, ln2, 1); GEN_EXP_COEFS(max_exp_x, F_PRECISION + POW2_K + 1, acc_exp_deg_f, "F_PRECISION acc exp poly coeffs", "ACC_EXP_F\t"); GENPOLY(ACC_EXP_F[%%d], ACC_EXP_POLY_F(x), acc_exp_deg_f); } else { SET_POLY_GLOBALS(1, 1, 1, 1); GEN_EXP_COEFS(max_exp_x, F_PRECISION + 1, acc_exp_deg_f, "F_PRECISION acc exp poly coeffs", "ACC_EXP_F\t"); _GENPOLY(ACC_EXP_F[%%d], ACC_EXP_POLY_F(t,x), -1, c0=t, acc_exp_deg_f + 1); /* ** NOTE: if (!ONE_TYPE) then ACC_EXP_POLY is identical ** to ACC_POW2_POLY */ } } ENDIF /* ** CONSTANTS FOR EXPM1 ** ------------------- ** ** For expm1, we essentially compute the accurate exp function and ** subtract 1. However, to maintain accuracy in all cases, when ** backup precision is not available, we need to compute evaluate ** e^z as 1 + z + z^2*q(z) rather than as 1 + z*p(z) ** ** Also, screening the input argument is a little more involved. We need ** to screen for large arguments (both positive and negative) and small ** arguments (where a polynomial approximation is appropriate). ** ** The bound for large positive arguments is the same as for exp. For ** large negative arguments, we want to know where expm1(x) = -1 to ** machine precision. Because the check is done on both positive and ** negative arguments on a sign/magnitude value, it is done in two ** parts, one for the positive arguments and one for the negative ** arguments. ** ** We arbitrarily define the polynomial range to have at least the same ** "effective" overhang as the table range. ("Effective" overhang is ** actual overhang less the number of bits of error in the smaller term.) */ expm1_max_poly_arg = 2/pow2_table_size; poly = F_HI_BITS_RND(expm1_max_poly_arg, MP_RM); lo = F_HI_BITS_RND((F_PRECISION + 1)*ln2, MP_RP); hi = F_HI_BITS_RND((F_MAX_BIN_EXP + F_NORM + 1)*ln2, MP_RP); PRINT_U_TBL_COM_VDEF_ITEM("F_PRECISION expm1 initial screening constants", "EXPM1_POLY_CHECK_F", poly); PRINT_U_TBL_VDEF_ITEM("EXPM1_HI_CHECK_F", hi); PRINT_U_TBL_VDEF_ITEM("EXPM1_LO_CHECK_F", lo); if (!ONE_TYPE) { poly = R_HI_BITS_RND(expm1_max_poly_arg, MP_RM); lo = R_HI_BITS_RND((R_PRECISION + 1)*ln2, MP_RM); hi = R_HI_BITS_RND((R_MAX_BIN_EXP + R_NORM + 1)*ln2, MP_RP); PRINT_U_TBL_COM_VDEF_ITEM( "R_PRECISION expm1 initial screening constants", "EXPM1_POLY_CHECK_R", poly); PRINT_U_TBL_VDEF_ITEM("EXPM1_HI_CHECK_R", hi); PRINT_U_TBL_VDEF_ITEM("EXPM1_LO_CHECK_R", lo); } ENDIF expm1_max_red_arg = 2/2^POW2_K; if (USE_BACKUP) { SET_POLY_GLOBALS(1, 1, 1, 1); GEN_EXP_COEFS(expm1_max_poly_arg, F_PRECISION + POW2_K, expm1_poly_deg_f, "F_PRECISION expm1 poly range poly coeffs", "EXPM1_F\t\t"); _GENPOLY(EXPM1_F[%%d], EXPM1_POLY_F(x), -1, c0=0, expm1_poly_deg_f + 1); SET_POLY_GLOBALS(1, 1, ln2, ln2); GEN_EXP_COEFS(expm1_max_red_arg, F_PRECISION + POW2_K, expm1_red_deg_f, "F_PRECISION expm1 reduce range poly coeffs", "EXPM1_RED_F\t"); _GENPOLY(EXPM1_RED_F[%%d], EXPM1_RED_POLY_F(x), -1, c0=0, expm1_red_deg_f + 1); } else { SET_POLY_GLOBALS(2, .5, 1, 1); GEN_EXP_COEFS(expm1_max_poly_arg, F_PRECISION + 1, expm1_poly_deg_f, "F_PRECISION expm1 poly range poly coeffs", "EXPM1_F\t\t"); _GENPOLY(EXPM1_F[%%d], EXPM1_POLY_F(x) (x) +, -2, c0=0 c1=0, expm1_poly_deg_f + 2); GEN_EXP_COEFS(expm1_max_red_arg, F_PRECISION + 1, expm1_red_deg_f, "F_PRECISION expm1 reduce range poly coeffs", "EXPM1_RED_F\t"); _GENPOLY(EXPM1_RED_F[%%d], EXPM1_RED_POLY_F(t,x), -2, c0=t c1=0, expm1_red_deg_f + 2); if (!ONE_TYPE) { SET_POLY_GLOBALS(1, 1, 1, 1); GEN_EXP_COEFS(expm1_max_poly_arg, R_PRECISION + POW2_K, expm1_poly_deg_r, "R_PRECISION expm1 poly range poly coeffs", "EXPM1_R\t\t"); _GENPOLY(EXPM1_R[%%d], EXPM1_POLY_R(x), -1, c0=0, expm1_poly_deg_r + 1); SET_POLY_GLOBALS(1, 1, ln2, ln2); GEN_EXP_COEFS(expm1_max_red_arg, R_PRECISION + POW2_K, expm1_red_deg_r, "R_PRECISION expm1 reduce range poly coeffs", "EXPM1_RED_R\t"); _GENPOLY(EXPM1_RED_R[%%d], EXPM1_RED_POLY_R(x), -1, c0=0, expm1_red_deg_r + 1); } ENDIF } /* ** CONSTANTS FOR SINH/COSH ** ----------------------- ** ** For sinh/cosh, we screen for large arguments (both positive and ** negative) and small arguments (where a polynomial approximation is ** appropriate). ** ** The bound for large arguments is log(2*F_MAX). ** ** We arbitrarily define the polynomial range to have at least the same ** "effective" overhang as the table range. ("Effective" overhang is ** actual overhang less the number of bits of error in the smaller term.) */ sinhcosh_max_poly_arg = sqrt(8/pow2_table_size); hi = F_HI_BITS_RND( (F_MAX_BIN_EXP + 1 + F_NORM)*ln2 + log((2 - 2^-(F_PRECISION - 1))), MP_RP); lo = F_HI_BITS_RND(sinhcosh_max_poly_arg, MP_RM); TABLE_COMMENT("F_PRECISION sinh/cosh argument screening constants"); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_OVERFLOW_CHECK_F", hi); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_BIG_CHECK_F", hi - lo); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_POLY_CHECK_F", lo); if (!ONE_TYPE) { hi = R_HI_BITS_RND((R_MAX_BIN_EXP + 1 - R_NORM)*ln2 + log((2 - 2^-(R_PRECISION - 1))), MP_RP); lo = R_HI_BITS_RND(sinhcosh_max_poly_arg, MP_RM); TABLE_COMMENT("R_PRECISION sinh/cosh argument screening constants"); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_OVERFLOW_CHECK_R", hi); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_BIG_CHECK_R", hi - lo); PRINT_U_TBL_VDEF_ITEM("SINHCOSH_POLY_CHECK_R", lo); } ENDIF /* ** ** The coefficients for sinh/cosh are based on the Taylor series expansions ** ** sinh(x) = x + x^3/3! + x^5/5! .... ** = x*[1 + x^2*P(x^2)] ** ** cosh(x) = 1 + x^2/2! + x^4/4! .... ** = 1 + x^2*Q(x^2)] ** ** On the reduced range, the coefficients for accurate exp(x) are used and ** simply broken up into even and odd polynomials ** ** The following function is used for the Remes approximation in much the ** same way as the e_to_x_poly() function is used. That is by ** appropriately setting the values first_term, first_term_value, x_scale ** and final_scale, we can approximate, sinh(x), cosh(x), sinh(x) - x, ** cosh(x) - 1, sinh(x*ln2), cosh(x*ln2), ... */ function sinh_cosh_poly(x) { auto s, z, k, t; s = first_term_value; if (x != 0) { k = first_term; z = (x*x)*x_scale; t = first_term_value; while(1) { k += 2; t = (t*z)/(k*k - k); if ((bexp(s) - bexp(t)) > bit_precision) break; s += t; } } ENDIF return s*final_scale; } # define GEN_SINH_COSH_COEFS(max_x, prec, deg, com, tag) \ { \ remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + \ REMES_SQUARE_ARG, 0, max_x, sinh_cosh_poly, \ prec, °, &coefs); \ PRINT_TBL_COM_ADEF_ARRAY(com, tag, deg); \ } if (USE_BACKUP) { SET_POLY_GLOBALS(1, 1, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, F_PRECISION + POW2_K, sinh_poly_deg_f, "F_PRECISION sinh poly range poly coeffs", "SINH_F\t\t"); _GENPOLY(SINH_F[%%d], SINH_POLY_F(x), -1, odd stride=2, 2*sinh_poly_deg_f + 1); SET_POLY_GLOBALS(0, 1, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, F_PRECISION + POW2_K, cosh_poly_deg_f, "F_PRECISION cosh poly range poly coeffs", "COSH_F\t\t"); _GENPOLY(COSH_F[%%d], COSH_POLY_F(x), -1, even stride=2, 2*cosh_poly_deg_f); _GENPOLY(ACC_POW2_F[%%d], SINHCOSH_ODD_POLY_F(x), 0, odd, acc_pow2_deg_f); _GENPOLY(ACC_POW2_F[%%d], SINHCOSH_EVEN_POLY_F(x), 0, even, acc_pow2_deg_f); } else { SET_POLY_GLOBALS(3, 1/6, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, F_PRECISION + 1, sinh_poly_deg_f, "F_PRECISION sinh poly range poly coeffs", "SINH_F\t\t"); _GENPOLY(SINH_F[%%d], SINH_POLY_F(x) (x) +, -3, odd stride=2 c1=0, 2*sinh_poly_deg_f + 3); SET_POLY_GLOBALS(2, .5, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, F_PRECISION + 1, cosh_poly_deg_f, "F_PRECISION cosh poly range poly coeffs", "COSH_F\t\t"); _GENPOLY(COSH_F[%%d], COSH_POLY_F(x) ONE +, -2, even stride=2 c0=0, 2*cosh_poly_deg_f + 2); _GENPOLY(ACC_EXP_F[%%d], SINHCOSH_ODD_POLY_F(x), -1, odd, acc_exp_deg_f + 1); _GENPOLY(ACC_EXP_F[%%d], SINHCOSH_EVEN_POLY_F(x), -1, even c0=0, acc_exp_deg_f + 1); if (!ONE_TYPE) { SET_POLY_GLOBALS(1, 1, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, R_PRECISION + POW2_K, sinh_poly_deg_r, "R_PRECISION sinh poly range poly coeffs", "SINH_R\t\t"); _GENPOLY(SINH_R[%%d], SINH_POLY_R(x), -1, odd stride=2, 2*sinh_poly_deg_r + 1); SET_POLY_GLOBALS(0, 1, 1, 1); GEN_SINH_COSH_COEFS(sinhcosh_max_poly_arg, R_PRECISION + POW2_K, cosh_poly_deg_r, "R_PRECISION cosh poly range poly coeffs", "COSH_R\t\t"); _GENPOLY(COSH_R[%%d], COSH_POLY_R(x), 0, even stride=2, 2*cosh_poly_deg_r); _GENPOLY(ACC_POW2_R[%%d], SINHCOSH_ODD_POLY_R(x), 0, odd, acc_pow2_deg_r); _GENPOLY(ACC_POW2_R[%%d], SINHCOSH_EVEN_POLY_R(x), 0, even, acc_pow2_deg_r); } ENDIF } /* ** MISCELLANEOUS SHARED CONSTANTS: ** ------------------------------- ** ** This section of MP code records the current build parameters that ** must be passed on to the functions that use the generated table and ** also generates constants that are not assocaiated with any particular ** function. Begin by recording the current build parameters. */ printf("#define LOG2_K\t\t\t%i\n", LOG2_K); printf("#define POW2_K\t\t\t%i\n", POW2_K); printf("#define NO_FAST\t\t\t%i\n", NO_FAST); printf("#define NO_ACC\t\t\t%i\n", NO_ACC); printf("#define USE_DIVIDE\t\t%i\n", USE_DIVIDE); /* ** Generate a floating point 1.0 for use in expm1 and scaling the input ** argument in the power functions. Also generate 1/ln2 for scaling the ** input argument in exp, expm1 and sinh/cosh and .5 for near ** overflow/underflow fixup. */ PRINT_TBL_COM_VDEF_ITEM("B_PRECISION .5, 1.0 and 2.0", "HALF\t\t", .5); PRINT_TBL_VDEF_ITEM("ONE\t\t", 1.0); PRINT_TBL_VDEF_ITEM("TWO\t\t", 2.0); PRINT_TBL_COM_VDEF_ITEM("B_PRECISION max float", "MAX_FLOAT\t", MP_MAX_FLOAT); PRINT_TBL_COM_VDEF_ITEM("1/ln2 in B_PRECISION", "RECIP_LN2\t", recip_ln2); /* ** GENERAL DISCUSSION OF x^y AND log2(x) ** ------------------------------------- ** ** This implementation computes the power x^y in three conceptual stages: ** ** o compute log2(x), with some extra bits of precision ** o multiply y * log2(x), maintaining the extra precision ** o evaluate 2 ^ product. ** ** In the actual implementations, the first two steps are combined. ** ** ** DEFINING THE TABLE SIZES: ** ------------------------- ** ** The evaluation of log2(x) and 2^product both use table look-up schemes ** to increase accuracy and performance. The number of extra bits of ** precision required for log2(x) is F_EXP_WIDTH - 1 + POW2_K, where ** 2^POW2_K is the number of entries in the 2^x table (See the previous ** discussion on 2^x). ** ** The total amount of extra precision in the log2(x) computation is a ** function of the log2 table size and the argument reduction scheme used. ** By way of explaination, consider calculating log2(f) for f in the ** interval [1,2). Let the table size for the log2 evaluation be 2^LOG2_K ** and let j the integer such that Fj = 1 + j/2^LOG2_K is closest to f. ** With the above definitions, we consider two possible argument reduction ** schemes: ** ** With : z = (f - Fj)/(f + Fj) ** divide: log2(f) = log2(Fj) + (2/ln2)*[z + z^3/3 + z^5/5 + ...] ** ** Without: w = (f - Fj)/Fj ** divide: log2(f) = log2(Fj) + (1/ln2)*[w - w^2/2 + w^3/3 - ... ] ** ** The worst case senario for accuracy is when f = 1 + 1/2^(LOG2_K + 1). ** This implies that log2(Fj) = 0 and that we can only get extended ** precision in the log2 computation by computing the first "few" terms ** of the series in extended precision. ** ** In the "with divide" case, we compute z in extended precision, and the ** amount of extra precision in the final result is (essentially) the ** alignment shift between z and z^3/3, or 2*LOG2_K + 5. ** ** In the "without divide" case, we compute s = w - w^2/2 in extended ** precision, and the amount of extra precision in the final result is ** (essentially) the alignment shift between s and w^3/3, or 2*LOG2_K + 3. ** ** If we are only considering accuracy, then we should chose LOG2_K and ** POW2_K according to the relationship: ** ** 2*LOG2_K + R = F_EXP_WIDTH - 1 + POW2_K ** ** where R is 5 or 3 depending on whether the argument reduction is uses a ** divide or not. However, since the power table is used for fast exp and ** regular exp (and possibly log2 and fast log2) the values of LOG2_K and ** POW2_K may be taken to be bigger than those prescribed by the above ** relation to increase the performance of any or all of the routines ** dependent upon the table. In particular, the default values of LOG2_K ** and POW2_K do not satify the above relationship, but were chosen to ** optimize the performance of fast exp and fast pow. ** ** ** COMPUTATION OF LOG2(x) ** ---------------------- ** ** The computation of log2(x) proceeds as follows: ** ** log2(2^I*f) = I + log2(f) ** = I + log2(Fj) + log2(f/Fj) ** = I + log2(Fj) + p(z) ** ** where f is in [1, 2 ), Fj = 1 + j/2^LOG2_K and z is the "reduced" ** argument (using one of the two methods described above) and p is ** is a polynomial. The form of p depends on the reduction methods. ** ** NOTE: A more detailed discussion of the follow ** two sections is contained in dpml_pow.c ** ** ** Reduction With Divides: ** ----------------------- ** ** If the argument reduction for log2(x) is going to use a divide, then ** we need to compute z = [(f - Fj)/(f + Fj)]*(2/ln2) and p(z) is evaluated ** as: ** ** p(z) = z + z^3*q(z^2) ** ** where ** ** q(t) = (ln2/2)^2 * sum{ [t*(ln2/2)^2]^n/(2n+3) | n = 0, 1, ... } ** ** It is necessary to compute z extra precision. If no backup precision ** is available, then z must be computed in hi and lo pieces in order to ** obtain required accuracy for log2(x). In this case the computation ** proceeds as follows: ** ** t = f - Fj ** s = (f + Fj) ** r = 1/s ** z = t*r ** z_hi = hi_bits(z) ** f_hi = hi_bits(f) ** f_lo = lo_bits(f) ** z_lo = {([(f_hi - Fj)*hi_bits(2/ln2) - z_hi*s] + ** f_lo*hi_bits(2/ln2)) + ** [t*lo_bits(2/ln2) - z_hi*f_lo]}*g; ** ** ** Reduction Without Divides: ** -------------------------- ** ** If the argument reduction for log2(x) is not going to use a divide, then ** we need to compute z = (f - Fj)/(Fj*ln2) and p(z) is evaluated ** as: ** ** p(z) = z - z^2*ln2/2 + z^3*q(z) ** ** where ** ** q(t) = -(ln2)^2 * sum{ [-t*ln2]^n/(n+3) | n = 0, 1, ... } ** ** It is necessary to compute s = z - z^2*ln2/2 to extra precision. If no ** backup precision is available, then s must be computed in hi and lo ** pieces in order to obtain required accuracy for log2(x). In this case ** the computation proceeds as follows: ** ** t = f - Fj ** z = t*(1/(Fj*ln2)) ** g = Fj*Fj*(ln2/2) ** u = 2*Fj ** s = (u - t)*t*g ** s_hi = hi_bits(s) ** v = Fj*s_hi ** t_hi = hi_bits(t) ** t_lo = lo_bits(t) ** s_lo = {[u*(t - v*hi_bits(ln2)) + t_hi^2] + ** [t_lo*(t + t_hi) - u*v*lo_bits(ln2))]}*g ** ** For the fast pow routine, we use the "no divide" reduction. However, ** we "cheat" on the accuracy of final result by computing the polynomial ** as ** p(z) = z_hi + z_lo - z*q(z) ** ** where ** ** q(t) = ln2 * sum{ [-t*ln2]^n/(n+2) | n = 0, 1, ... } ** */ /* ** CONSTANTS FOR LOG2 ** ------------------ ** ** When no backup is available, computing the reduced arguement requires ** 2/ln2 in hi an lo pieces or ln2/2 in full precision and ln2 in hi ** and lo pieces, depending on whether divide is used or not. */ if (!USE_BACKUP) { if (USE_DIVIDE) { c = 2*recip_ln2; PRINT_TBL_COM_VDEF_ITEM("2/ln2 in F_PRECISION and hi/lo", "TWO_OVER_LN2\t", c); c_hi = bround(c, R_PRECISION); PRINT_TBL_VDEF_ITEM("TWO_OVER_LN2_HI\t", c_hi); PRINT_TBL_VDEF_ITEM("TWO_OVER_LN2_LO\t", c - c_hi); } else { PRINT_TBL_COM_VDEF_ITEM("ln2/2 in F_PRECISION", "LN2_OVER_TWO\t", .5*ln2); } } ENDIF /* ** Log Polynomials: ** ---------------- ** ** As indicated above, we use two different polynomial log evaluations ** depending on whether division is used or not. when using a divide: ** ** ln(F/Fj) = 2z + 2*z^3/3 + 2*z^5/5 + ...., z = (F - Fj)/(x + Fj) ** ** or letting u = 2*z/ln2, ** ** log2(F/Fj) = u + u^3*ln2^2/12 + u^5*ln2^4/80 + ...., ** = u + u^3*(ln2^2/12 + u^2*ln2^4/80 + u^4*ln2^6/448....) ** = u + u^3*P(u^2) ( if no backup precision ) ** = u*Q(u^2) ( if backup precision ) */ function divide_log2_poly(x) { auto s, z, k, u, t; s = first_term_value; if (x != 0) { k = 2*first_term + 1; z = (x*x)*x_scale; t = z; while(1) { k += 2; u = t/k; if ((bexp(s) - bexp(u)) > bit_precision) break; s += u; t *= z; } } ENDIF return s*final_scale; } /* ** When not using a divide: ** ** ln(F/Fj) = w - w^2/2 + w^3/3 ... , w = (F - Fj)/Fj ** ** For the accurate pow, we let v = w/ln2, and write the above as: ** ** log2(F/Fj) = v - v^2*ln2/2 + v^3*ln2^2/3 ... ** = (v - v^2*ln2/2) + v^3*(ln2^2/3 - v*ln2^3/4 ...) ** = (v - v^2*ln2/2) + v^3*P(v) ( if no backup prec ) ** ** For fast pow we write the power series as: ** ** log2(F/Fj) = v - v^2*ln2/2 + v^3*ln2^2/3 ... ** = v + v^2*(-ln2/2 + v*ln2^2/3 - v^2*ln2^2/4 + ...) ** = v + v^2*P(v) ( if no backup prec ) ** ** If backup precision is available we can write the series as ** ** log2(F/Fj) = v - v^2*ln2/2 + v^3*ln2^2/3 ... ** = v*P(v) ** ** Note that whether using the divide or non-divide form, the reduced ** argument is most negative, when j = 1 and F = F0; and is most positive ** when j = 0 and F = F1. */ function no_divide_log2_poly(x) { auto s, z, k, u, t; s = first_term_value; if (x != 0) { k = first_term + 2; z = x*x_scale; t = z; while(1) { u = t/k; if (bexp(s) - bexp(u) > bit_precision) break; s += u; t *= z; k++; } } ENDIF return s*final_scale; } # define __GEN_LOG_COEFS(term, min, max, func, prec, deg, com, tag) \ { \ remes(REMES_FIND_POLYNOMIAL + REMES_RELATIVE_WEIGHT + \ (term), min, max, func, prec, °, &coefs); \ PRINT_TBL_COM_ADEF_ARRAY(com, tag, deg); \ } # define GEN_DIV_LOG_COEFS(max, prec, deg, com, tag) \ __GEN_LOG_COEFS(REMES_SQUARE_ARG, 0., max, \ divide_log2_poly, prec, deg, com, tag) # define GEN_NO_DIV_LOG_COEFS(min, max, prec, deg, com, tag) \ __GEN_LOG_COEFS(REMES_LINEAR_ARG, min, max, \ no_divide_log2_poly, prec, deg, com, tag) log2_table_size = 2^LOG2_K; min_arg = -1/((2*log2_table_size + 2)*ln2); max_arg = 1/(2*log2_table_size*ln2); if (!NO_ACC) { if (USE_DIVIDE) { c = ln2/2; max_div_arg = 2/((4*log2_table_size + 1)*ln2); if (USE_BACKUP) { SET_POLY_GLOBALS(0, 1, c*c, 1); GEN_DIV_LOG_COEFS(max_div_arg, F_PRECISION + 2*LOG2_K + 3, acc_log2_deg_f, "F_PRECISION acc log2 poly coeffs", "ACC_LOG2_F\t"); _GENPOLY(ACC_LOG2_F[%%d], ACC_LOG2_POLY_F(t,x), 0, odd stride=2 c0=t, 2*acc_log2_deg_f + 1); } else { SET_POLY_GLOBALS(1, 1/3, c*c, c*c); GEN_DIV_LOG_COEFS(max_div_arg, F_PRECISION + 1, acc_log2_deg_f, "F_PRECISION acc log2 poly coeffs", "ACC_LOG2_F\t"); _GENPOLY(ACC_LOG2_F[%%d], ACC_LOG2_POLY_F(t,x), -3, odd stride=2 c0=t c1=0, 2*acc_log2_deg_f + 3); if (!ONE_TYPE) { /* Get R_PRECISION coefficients - backup prec assumed. */ SET_POLY_GLOBALS(0, 1, c*c, 1); GEN_DIV_LOG_COEFS(max_div_arg, R_PRECISION + 2*LOG2_K + 3, acc_log2_deg_r, "R_PRECISION acc log2 poly coeffs", "ACC_LOG2_R\t"); _GENPOLY(ACC_LOG2_R[%%d], ACC_LOG2_POLY_R(t,x), 0, odd stride=2 c0=t, 2*acc_log2_deg_r + 1); } ENDIF } } else /* !USE_DIVIDE */ { if (USE_BACKUP) { SET_POLY_GLOBALS(0, 1, -c, 1); GEN_NO_DIV_LOG_COEFS(min_arg, max_arg, F_PRECISION + LOG2_K + 3, acc_log2_deg_f, "F_PRECISION acc log2 poly coeffs", "ACC_LOG2_F\t"); _GENPOLY(ACC_LOG2_F[%%d], ACC_LOG2_POLY_F(t,x), -1, c0=t, acc_log2_deg_f); } else { SET_POLY_GLOBALS(2, 1/3, -ln2, ln2*ln2); GEN_NO_DIV_LOG_COEFS(min_arg, max_arg, F_PRECISION + 1, acc_log2_deg_f, "F_PRECISION acc log2 poly coeffs", "ACC_LOG2_F\t"); _GENPOLY(ACC_LOG2_F[%%d], ACC_LOG2_POLY_F(t,x), -3, c0=t c1=0 c2=0, acc_log2_deg_f + 3); } if (!ONE_TYPE) { /* ** backup precision is assumed. Also, we can combine the ** addition of the hi bits of log2(x) with the polynomial ** evaluation. */ SET_POLY_GLOBALS(0, 1, -ln2, 1); GEN_NO_DIV_LOG_COEFS(min_arg, max_arg, R_PRECISION + LOG2_K + 3, acc_log2_deg_r, "R_PRECISION acc log2 poly coeffs", "ACC_LOG2_R\t"); _GENPOLY(ACC_LOG2_R[%%d], ACC_LOG2_POLY_R(t,x), -1, c0=t, acc_log2_deg_r + 1); } ENDIF } } ENDIF if (!NO_FAST) { /* ** We assume that we are not using the divide reduction for the ** fast case. Additionally, we assume that if backup precision ** is available, the fast polynomial is the same as the accurate ** polynomial except that the first two terms are computed ** separately and added in afterwards. */ if (USE_BACKUP) printf("#define FAST_LOG2_POLY_F\t\tACC_LOG2_POLY_F\n"); else { SET_POLY_GLOBALS(1, 1/2, -ln2, -ln2); GEN_NO_DIV_LOG_COEFS(min_arg, max_arg, F_PRECISION + 1, fast_log2_deg_f, "F_PRECISION fast log2 poly coeffs", "FAST_LOG2_F\t"); _GENPOLY(FAST_LOG2_F[%%d], FAST_LOG2_POLY_F(t,x), -2, c0=t c1=0 c2=0, fast_log2_deg_f + 2); if (!ONE_TYPE) { _GENPOLY(ACC_LOG2_R[%%d], FAST_LOG2_POLY_R(t,x), -1, c0=t c1=0 c2=0, acc_log2_deg_r + 1); } } } ENDIF /* ** THE LOG2 TABLE ** ---------------- ** ** The actual format of the log2 table depends on whether it will be shared ** between functions and/or data types and whether or not backup precision ** is available. In general, for j = 0, 1, ... 2^LOG2_K, the table needs to ** contain the following values: ** ** Fj = 1 + j/2^LOG2_K ** Rj = 1/(Fj*ln2) ** Lj = log2(Fj) ** ** If there is no back-up data type available, then the values Rj and Lj ** need to be stored in hi and lo pieces. The following table gives the ** required table values: ** ** Function Fj Rj Rj_hi Rj_lo Lj Lj_hi Lj_lo ** ---------------------------------+---+---------------+---------------+ ** fast pow / backup | x | x | x | ** acc pow / backup / divide | x | | x | ** acc pow / backup / no divide | x | x | x | ** fast pow / no backup | x | x x | x x | ** acc pow / no backup / divide | x | | x x | ** acc pow / no backup / no divide | x | x | x x | ** ---------------------------------+---+---------------+---------------+ ** ** Based on the above table and the number of possible combinations ** for sharing of the table, the log table can have many different formats. ** In the interest of time and simplicity, only the two combination ** suitable for building the DPML on Alpha are inlcude here. */ # if (ONE_TYPE && NO_FAST && !USE_BACKUP && USE_DIVIDE) /* ** These macros build the log table for a single, accurate power ** function when backup precision is not available and division is ** used. (This is the quad-precision case) */ # define LOG_TABLE_BANNER \ "\n\t * Fj, hi(log2(Fj)) and lo(log2(Fj) in base precision" \ "\n\t *\n\t * offset" \ " row" \ "\n\t" # define PRINT_LOG_TABLE_ACCESS_MACROS(disp) \ printf("#define POW_EVAL_FLAGS\t\tUSE_DIVIDE\n"); \ __PRINT_TABLE_DEF("GET_F(j)\t", F_CHAR, disp); \ __PRINT_TABLE_DEF("LOG_F_HI(j)\t", F_CHAR, disp); \ __PRINT_TABLE_DEF("LOG_F_LO(j)\t", F_CHAR, disp) # define LOG_INDEX_BASE_POS (__LOG2(BITS_PER_F_TYPE) - 3) # define LOG_INDEX_SCALE 3 # define PRINT_LOG_TABLE_ENTRY(j, Fj, Rj, Lj) \ printf( "\t/* %4i */ %#.4" STR(F_CHAR) ", /* %3i */\n", \ BYTES(MP_BIT_OFFSET), Fj, j); \ MP_BIT_OFFSET += BITS_PER_F_TYPE; \ Lj_hi = bround(Lj, F_HI_HALF_PRECISION); \ __PRINT_TABLE_VALUE(F_CHAR, Lj_hi); \ __PRINT_TABLE_VALUE(F_CHAR, Lj - Lj_hi) # elif !(ONE_TYPE || NO_FAST || NO_ACC || USE_DIVIDE) /* ** These macros build the log table for a shared table for both ** accurate and fast pow in two types, the larger of which has no ** backup precision and no divide is used. */ # define LOG_TABLE_BANNER \ "\n\t * Fj, Rj = 1/(Fj*ln2) and Lj = log2(Fj). Lj and Rj are" \ "\n\t * given in hi and low parts. Fj and the hi part or Lj are" \ "\n\t * in reduced precision; Rj, lo(Rj) and lo(Lj) in standard" \ "\n\t * precision with hi(Rj) = Rj - lo(Rj)" \ "\n\t *" \ "\n\t * offset row" \ "\n\t" # define PRINT_LOG_TABLE_ACCESS_MACROS(disp) \ __PRINT_TABLE_DEF("GET_F(j)\t", R_CHAR, disp); \ __PRINT_TABLE_DEF("LOG_F_HI(j)\t", R_CHAR, disp); \ __PRINT_TABLE_DEF("RECIP_F(j)\t", F_CHAR, disp); \ __PRINT_TABLE_DEF("RECIP_F_LO(j)\t", F_CHAR, disp); \ __PRINT_TABLE_DEF("LOG_F_LO(j)\t", F_CHAR, disp) # define LOG_INDEX_BASE_POS (__LOG2(BITS_PER_F_TYPE) - 1) # define LOG_INDEX_SCALE 1 # define PRINT_LOG_TABLE_ENTRY(j, Fj, Rj, Lj) \ Lj_hi = bround(Lj, R_PRECISION); \ printf( "\t/* %4i */ %#.4" STR(R_CHAR) ", %#.4" \ STR(R_CHAR) ", /* %3i */\n", BYTES(MP_BIT_OFFSET), \ Fj, Lj_hi, j); \ MP_BIT_OFFSET += 2*BITS_PER_R_TYPE; \ __PRINT_TABLE_VALUE(F_CHAR, Rj); \ __PRINT_TABLE_VALUE(F_CHAR, Rj - bround(Rj, LOG2_K)); \ __PRINT_TABLE_VALUE(F_CHAR, Lj - Lj_hi) # else # error "ERROR: Log table generation for this set of switches NYI" # endif disp = MP_BIT_OFFSET; PRINT_LOG_TABLE_ACCESS_MACROS(disp); printf("#define LOG_INDEX_BASE_POS\t%i \n", LOG_INDEX_BASE_POS); printf("#define LOG_INDEX_SCALE\t\t%i \n", LOG_INDEX_SCALE); TABLE_COMMENT( LOG_TABLE_BANNER ); for (i = 0; i <= log2_table_size; i++) { Fj = 1 + (i/log2_table_size); Rj = 1/(Fj*ln2); Lj = log2(Fj); PRINT_LOG_TABLE_ENTRY( i, Fj, Rj, Lj); } END_TABLE; printf( "#else\n" "\n extern const "STR(B_TYPE)" "STR(MP_TABLE_NAME)"[%i]; \n" "\n#endif\n\n", MP_BIT_OFFSET/BITS_PER_F_TYPE - 1); @end_divert @eval my $outText = MphocEval( GetStream( "divertText" ) ); \ my $defineText = Egrep( "#define", $outText, \$tableText ); \ my $polyText = Egrep( STR(GENPOLY_EXECUTABLE), $tableText, \ \$tableText ); \ $polyText = GenPoly( $polyText ); \ $outText = "$tableText\n\n$defineText\n\n$polyText"; \ my $headerText = GetHeaderText( STR(BUILD_FILE_NAME), \ "Definitions and constants for " . \ "power and related functions", __FILE__); \ print "$headerText\n\n$outText"; /* end of the MAKE_INCLUDE mphoc code section */ LIBRARY/float128/dpml_tgamma.c0000644€­ Q00042560000000704114616534611016062 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #include "dpml_private.h" #include "dpml_special_exp.h" #ifndef BASE_NAME # define BASE_NAME TGAMMA_BASE_NAME #endif #if !defined F_ENTRY_NAME # define F_ENTRY_NAME F_TGAMMA_NAME #endif #if USE_BACKUP # define LO_PART_DECL #else # define LO_PART pow2_low # define LO_PART_DECL , F_TYPE *LO_PART #endif extern F_TYPE F_RT_LGAMMA_NAME(F_TYPE , int *); extern B_TYPE F_EXP_SPECIAL_ENTRY_NAME ( F_TYPE , WORD * LO_PART_DECL ); extern F_TYPE F_LDEXP_NAME( F_TYPE, int ); static const U_INT_64 Inf = 0x7ff0000000000000; #define INF (((D_UNION *) &Inf)->f) F_TYPE F_ENTRY_NAME(F_TYPE x) { EXCEPTION_RECORD_DECLARATION F_TYPE y; F_TYPE mantissa_lo; WORD pow_of_two, bExp, j; int signgam = 0; F_UNION u; u.f = x; j = u.F_HI_WORD; if ( j & F_SIGN_BIT_MASK) { // In put is negative if ( (j & F_EXP_MASK) >= (((WORD) (F_EXP_BIAS + F_PRECISION)) << F_EXP_POS)) { // Argument is a negitive integer return INF; } } else if ( x > 171.6243769563027208124443787857704267196259) { // Large positive value is garanteed to overflow return INF; } else if ( x != x ) { return (x); } // Normal argument, may or may not overflow u.f = F_RT_LGAMMA_NAME(x, &signgam); j = u.F_HI_WORD; bExp = j & F_EXP_MASK; if ( bExp == F_EXP_MASK ) { // Overflow or invalid return (signgam < 0) ? -u.f : u.f; } else if ( u.f < -750 ) { // Certain Underflow return (F_TYPE) 0.0; } else if ( u.f > 710 ) { // Certain overflow return signgam < 0 ? -INF : INF; } else { y = F_EXP_SPECIAL_ENTRY_NAME(u.f, &pow_of_two, &mantissa_lo ); y += mantissa_lo; if ( signgam < 0 ) y = -y; return F_LDEXP_NAME( y, pow_of_two >> POW2_K ); } } LIBRARY/float128/dpml_asinh.c0000644€­ Q00042560000012772114616534611015726 0ustar aakkasintelall/****************************************************************************** Copyright (c) 2007-2024, Intel Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ #include "dpml_private.h" #include "sqrt_macros.h" #undef MAKE_ASINH #undef MAKE_ACOSH #if defined(ASINH) # define MAKE_ASINH # define BASE_NAME ASINH_BASE_NAME # define _F_ENTRY_NAME F_ASINH_NAME #elif defined(ACOSH) # define BASE_NAME ACOSH_BASE_NAME # define _F_ENTRY_NAME F_ACOSH_NAME #else # error "Must have one of ASINH, ACOSH defined" #endif #if !defined(F_ENTRY_NAME) # define F_ENTRY_NAME _F_ENTRY_NAME #endif /* Arcsinh & Arccosh -------------------------------------- This source can be compiled into both Arcsine and Arccosine routines. The definitions necessary to create the function follow. Function Generation: Along with any standard compile time definitions required by the dpml the following items should be defined on the compilation command line to create the indicated routine. Arcsinh : ASINH Arccosh : ACOSH To create each routine's 'include' file an initial compilation should be done using the following definition in addition to those above. MAKE_INCLUDE Selectable Build-time Parameters: The definitions below define the minimum "overhang" limits for those ranges of the routine with adjustable accuracy bounds. The numbers specified in the definitions are the number of binary digits of overhang. A complete discussion of these values and their use is included in the individual routine documentation. */ #define POLY_RANGE_OVERHANG 5 #define REDUCE_RANGE_OVERHANG 5 #define ASYM_RANGE_OVERHANG 7 #define LARGE_RANGE_OVERHANG 7 #if !defined(MAKE_INCLUDE) #include STR(BUILD_FILE_NAME) #endif /* Arcsinh -------------------------- The Arcsinh designs described here are the result of an effort to create a fast Arcsinh routine with error bounds near 1/2 lsb. The inherent conflict is that, to create fast routines we generally need to give up some accuracy, and conversely, to increase accuracy we often must give up speed. As a result, the design we're presenting defines a user (builder) configureable routine. That is, it is set up such that the builder of a routine may choose, through the proper setting of parameters, the degree of accuracy of the generated routine and hence, indirectly, its speed. The Design: The overall domain of the Arcsinh function has been divided up into six regions or paths as follows: (1) (2) (3) (4) (5) (6) |--------|------------|-----------|-----------|-------|----------| 0 small polynomial reduction asymptotic large huge (Note: Although the domain of Arcsinh extends from -infinite to +infinite, the problem can be considered one of only positive arguments through the application of the identity asinh(-x) = - asinh(x). ) Within each region a unique approximation to the Arcsinh function is used. Each is chosen for its error characteristics, efficiency and the range over which it can be applied. 1. Small region: asinh(x) = x (x <= max_small) Within the "small" region the Arcsinh function is approximated as asinh(x) = x. This is a very quick approximation but it may only be applied to small input values. There is effectively no associated storage costs. By limiting the magnitude of x the error bound can be limited to <= 1/2 lsb. 2. Polynomial region: Within the "polynomial" region the function is approximated as asinh(x) = x (1 + x^2 P(x)) (max_small_x