pax_global_header00006660000000000000000000000064133377500670014525gustar00rootroot0000000000000052 comment=4b365417c27192b24f11607505f4928fc6ea5179 kfj-vspline-4b365417c271/000077500000000000000000000000001333775006700147135ustar00rootroot00000000000000kfj-vspline-4b365417c271/LICENSE000066400000000000000000000022061333775006700157200ustar00rootroot00000000000000vspline - generic C++ code for creation and evaluation of uniform b-splines Copyright 2015 - 2018 by Kay F. Jahnke Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. kfj-vspline-4b365417c271/README.rst000066400000000000000000000344741333775006700164160ustar00rootroot00000000000000=================================================================== vspline - generic C++ code to create and evaluate uniform B-splines =================================================================== ------------ Introduction ------------ vspline aims to provide a free, comprehensive and fast library for uniform B-splines and their use with n-dimensional raster data, using multithreading and SIMD processing. Uniform B-splines are a method to provide a 'smooth' interpolation over a set of uniformly sampled data points. They are commonly used in signal processing as they have several 'nice' qualities - an in-depth treatment and comparison to other interpolation methods can be found in the paper 'Interpolation Revisited' [CIT2000]_ by Philippe Thévenaz, Member, IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE. While there are several freely available packets of B-spline code, I failed to find one which is comprehensive, efficient and generic at once. vspline attempts to be all that, making use of generic programming in C++11, and of common, but often underused hardware features in modern processors. Overall, there is an emphasis on speed, even if this makes the code more complex. I tried to eke as much performance out of the hardware at my disposal as possible, only compromising when the other design goals would have been compromised. Some of the code is quite low-level, but there are reasonably high-level mechanisms to interface with vspline, allowing easy access to it's functionality without requiring users to familiarize themselves with the internal workings. High-level approach is provided via class 'bspline' defined in bspline.h, 'evaluator' objects from eval.h and via the remap/transform functions defined in transform.h. This high-level code allows using fast, high-quality interpolation with just a few lines of code. It should be especially attractive to vigra users, since the data handling is done with vigra data types. While I made an attempt to write code which is portable, vspline is only tested with g++ and clang++ on Linux. It may work in other environments, but it's unlikely it will do so without modification. An installation of Vigra_ is needed to compile, installation of Vc_ is optional but recommended. Some Linux distros offer both vigra and Vc versions suitable for vspline, alternatively you can build either from source. A vspline debian package_ is now available. Note: in November 2017, with help from Bernd Gaucke, vspline's companion program pv, which uses vspline heavily, was successfully compiled with 'Visual Studio Platform toolset V141'. While no further tests have been done, I hope that I can soon extend the list of supported platforms. vspline is relatively new, the current version might qualify as late beta, but I anticipate moving from 0.X releases to a 1.0 release soon, since the code has seen a lot of use by now and there are extensive tests which have verified it's working correctly. I have made efforts to cover 'reasonable' use cases, but there may be corner cases and unexpected scenarios where my code fails. The code is not well shielded against inappropriate parameters. The intended audience is developers rather than end users. vspline should be useful for image processing, volume slicing, and other tasks needing efficient, precise interpolation. ----- Scope ----- There are (at least) two different approaches to tackle B-splines as a mathematical problem. The first one is to look at them as a linear algebra problem. Calculating the B-spline coefficients is done by solving a set of equations, which can be codified as banded diagonal matrices with slight disturbances at the top and bottom, resulting from boundary conditions. The mathematics are reasonably straightforward and can be efficiently coded (at least for lower-degree splines), but I found it hard to generalize to B-splines of higher order. The second approach to B-splines comes from signal processing, and it's the one which I found most commonly used in the other implementations I studied. It generates the B-spline coefficients by applying a forward-backward recursive digital filter to the data and usually implements boundary conditions by picking appropriate initial causal and anticausal coefficients. Once I had understood the process, I found it elegant and beautiful - and perfectly general, lending itself to the implementation of a body of generic code with the scope I envisioned. Evaluation of a b-spline requires picking a subset of the coefficients and forming a weighted sum of this subset. So there are two components involved: a DDA to get the coefficients from memory and the formation of the weighted sum. vspline is designed to do these tasks as quickly as possible, and a large part of it's code is dedicated to the evaluation process. As it turns out, merely having a fast evaluation function is not enough, and a good part of vspline deals with moving data around to have them in place for processing at the right moment. vspline can handle - real and integer data types and their aggregates [1]_ - coming in strided 1D or nD memory - a reasonable selection of boundary conditions - arbitrary spline degrees (currently up to 45) - arbitrary dimensions of the spline - in multithreaded code - using the CPU's vector units if possible On the evaluation side I provide - evaluation of the spline at point locations in the defined range - evaluation of the spline's derivatives - specialized code for degrees 0 and 1 (nearest neighbour and n-linear) - mapping of arbitrary coordinates into the defined range - evaluation of nD arrays of coordinates ('remap' function) - coordinate-fed remap function ('index_remap') - functor-based remap, aka 'transform' functions - functor-based 'apply' function - restoration of the original data from the coefficients To produce maximum perfomance, vspline has a fair amount of collateral code, and some of this code may be helpful beyond vspline: - range-based multithreading with a thread pool - functional constructs using vspline::unary_functor* - forward-backward n-pole recursive filtering* - separable convolution* - efficient access to the b-spline basis functions - extremely precise precalculated constants * note that the filtering code and the transform-type routines multithread and vectorize automatically. The code at the very core of my B-spline prefiltering code evolved from the code by Philippe Thévenaz which he published here_, with some of the boundary condition treatment code derived from formulae which he communicated to me. Next I needed code to handle multidimensional arrays in a generic fashion in C++. I chose to use Vigra_. Since my work has a focus on signal (and, more specifically image) processing, it's an excellent choice, as it provides a large body of code with precisely this focus and has well-thought-out, reliable support for multidimensional arrays and small zero-overhead aggregates. vigra's MultiArray and MultiArrayView types are similar to Numpy_ arrays. Once I had a prototype up and running, I looked out for further improvements to it's speed. While using GPUs is tempting, I chose not to go down this path. Instead I chose to stick with CPU code and use vectorization. Again, I did some research and found Vc_. Vc allowed me to write generic code for vectorization for a reasonably large sets of targets. I found the performance gain for some data types so enticing that I chose to make my code optionally use Vc. The use of Vc has to be activated by a compile-time flag (USE_VC). When Vc can't be used - or for data types Vc can't vectorize - vspline uses SIMD emulation: data are aggregated to small vector-friendly parcels and the expectation is that processing of such aggregates may trigger the compiler's autovectorization. I call this technique 'goading' and found it to work surprisingly well. This technique works especially well for vspline's digital filters, while the b-spline evaluation code with it's more complex arithmetics and frequent DDA does benefit less. I did all my programming on a Kubuntu_ system, running on an intel(R) Core (TM) i5-4570 CPU, and used GNU gcc_ and clang_ to compile the code in C++11 dialect. While I am confident that the code runs on other CPUs, I have not tested it much with other compilers or operating systems. .. _here: http://bigwww.epfl.ch/thevenaz/interpolation/ .. _Vigra: http://ukoethe.github.io/vigra/ .. _Vc: https://github.com/VcDevel/Vc .. _Kubuntu: http://kubuntu.org/ .. _gcc: https://gcc.gnu.org/ .. _clang: http://http://clang.llvm.org/ .. _package: https://tracker.debian.org/pkg/vspline .. [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000, available online here_ .. _online here: http://bigwww.epfl.ch/publications/thevenaz0002.pdf .. [1] I use 'aggregate' here to mean a collection of identical elements, in contrast to what C++ defines as an aggregate type. So aggregates would be pixels, vigra TinyVectors, and, also, complex types. ------------- Documentation ------------- vspline uses doxygen to create documentation. You can access the documentation online here: documentation_ I have made an effort to comment my code very extensively. There are probably more comments in vspline than actual code, and this is appropriate, since the code is in large parts complex template metacode, in an attempt to make it as generic as possible. vspline comes with a fair amount of examples, which at times are a bit rough-and-ready, and less well-groomed than the library code. Nevertheless the examples should help in understanding how vspline can be put to use. I've also published my bootstrapping code, producing precalculated b-spline prefilter pole values and basis function values in high-precision arithmetic using GSL, BLAS and GNU GMP. Some of the examples have self-testing code to establish that an installation of vspline runs correctly. ----- Speed ----- While performance will vary widely from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of several runs: :: testing bc code MIRROR spline degree 3 using SIMD emulation avg 32 x prefilter:............................ 12.93750 ms avg 32 x restore original data from 1D:........ 6.03125 ms avg 32 x transform with ready-made bspline:.... 46.18750 ms avg 32 x restore original data: ............... 15.90625 ms testing bc code MIRROR spline degree 3 using Vc avg 32 x prefilter:............................ 13.12500 ms avg 32 x restore original data from 1D:........ 5.50000 ms avg 32 x transform with ready-made bspline:.... 21.40625 ms avg 32 x restore original data: ............... 10.15625 ms As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higher-degree splines doesn't take much more time (when using Vc): :: testing bc code MIRROR spline degree 5 using Vc avg 32 x prefilter:............................ 13.59375 ms testing bc code MIRROR spline degree 7 using Vc avg 32 x prefilter:............................ 15.00000 ms Using double precision arithmetics, vectorization doesn't help so much, and prefiltering is actually slower on my system when using Vc. Doing a complete roundtrip run on your system should give you an idea about which mode of operation best suits your needs. These speed measurements are a bit of a moving target, since the code base is still under development - some modifications make some things faster and other things slower. But the figures above should give you an idea what to expect. Since vspline can make heavy demands on memory bandwidth and CPU usage, it also lends itself to benchmarking a given system, running it full throttle on all cores with SIMD instructions. vspline is written to auto-vectorize well, so even if Vc is not used, if the relevant compiler switches are set (-march=native, -mavx2, -O3 or -Ofast etc) the code will use SIMD instructions if the compiler autovectorizes it, so vspline can also serve to check how well different compilers autovectorize. ---------- History ---------- Some years ago, I needed uniform B-splines for a project in python. I looked for code in C which I could easily wrap with cffi_, as I intended to use it with pypy_, and found K. P. Esler's libeinspline_. I proceeded to code the wrapper, which also contained a layer to process Numpy_ nD-arrays, but at the time I did not touch the C code in libeinspline. The result of my efforts is still available from the repository_ I set up at the time. I did not use the code much and occupied myself with other projects, until my interest in B-splines was rekindled sometime in late 2015. I had a few ideas I wanted to try out which would require working with the B-spline C code itself. I started out modifying code in libeinspline, but after some initial progress I felt constricted by the code base in C, the linear algebra approach, the limitation to cubic splines, etc. Being more of a C++ programmer with a fondness for generic programming, I first re-coded the core of libeinspline in C++ (to continue using my python interface), but then, finally, I decided to start afresh in C++, abandon the link to libeinspline and not have a python interface (for now). This is the result of my work. I have chosen the name vspline, because I rely heavily on two libraries staring with 'V', namely Vigra_ and Vc_. The code has been under development for severeal years now. Once the initial design turned out to be workable I went on to go through all the components cyclically, trying to make everything rock solid and precise. While I still assign 0.X serial numbers, the code is approaching 1.0 and the high-level API is already pretty stable. .. _cffi: https://cffi.readthedocs.org/en/latest/ .. _pypy: http://pypy.org/ .. _libeinspline: http://einspline.sourceforge.net/ .. _Numpy: http://www.numpy.org/ .. _repository: https://bitbucket.org/kfj/python-bspline .. _documentation: https://kfj.bitbucket.io kfj-vspline-4b365417c271/basis.h000066400000000000000000000607201333775006700161720ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file basis.h \brief Code to calculate the value of the B-spline basis function and it's derivatives. This file begins with some collateral code used to 'split' coordinates into an integral part and a small real remainder. This split is used in b-spline evaluation and fits thematically with the remainder of the file, which deals with the basis function. vspline only uses the B-spline basis function values at multiples of 0.5. With these values it can construct it's evaluators which in turn are capable of evaluating the spline at real coordinates. Since these values aren't 'too many' but take some effort to calculate precisely, they are provided as precalculated constants in poles.h, which also holds the prefilter poles. The basis function values at half unit steps are used for evaluation via a 'weight matrix'. This is a matrix of numbers which can yield the value of a b-spline by employing a simple fixed-time matrix/vector multiplication (which also vectorizes well). In a similar way, a 'basis_functor' can be set up from these values which can provide the value of the basis function for arbitrary real arguments. for a discussion of the b-spline basis function, have a look at http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html */ #ifndef VSPLINE_BASIS_H #define VSPLINE_BASIS_H #include "common.h" // poles.h has precomputed basis function values sampled at n * 1/2. // These values were calculated to very high precision in a separate // program (see bootstrap.cc) using GNU GMP, GSL and BLAS. #include "poles.h" namespace vspline { /// coordinates are split into an integral part and a remainder. this is /// used for weight generation, and also for calculating the basis function /// value. The split is done differently for odd and even splines. /// /// Note how the initial call to std::floor produces a real type, which /// is used to subtract from 'v', yielding the remainder in 'fv'. Only after /// having used this real representation of the integral part, it is cast /// to an integral type by assigning it to 'iv'. This is the most efficient /// route, better than producing an integral-typed integral part directly /// and subtracting that from 'v', which would require another conversion. /// Technically, one might consider this 'split' as a remainder division by 1. template < typename ic_t , typename rc_t > void odd_split ( rc_t v , ic_t& iv , rc_t& fv ) { rc_t fl_i = floor ( v ) ; fv = v - fl_i ; vspline::assign ( iv , fl_i ) ; } // roll out the split function for vigra::TinyVectors template < typename ic_t , typename rc_t , int N > void odd_split ( vigra::TinyVector < rc_t , N > v , vigra::TinyVector < ic_t , N > & iv , vigra::TinyVector < rc_t , N > & fv ) { for ( int d = 0 ; d < N ; d++ ) odd_split ( v[d] , iv[d] , fv[d] ) ; } /// for even splines, the integral part is obtained by rounding. when the /// result of rounding is subtracted from the original coordinate, a value /// between -0.5 and 0.5 is obtained which is used for weight generation. // TODO: there is an issue here: the lower limit for an even spline // is -0.5, which should be rounded *towards* zero, but std::round rounds // away from zero. The same applies to the upper limit, which should // also be rounded towards zero, not away from it. Currently I am working // around the issue by increasing the spline's headroom by 1 for even splines, // but I'd like to be able to use rounding towards zero. It might be argued, // though, that potentially producing out-of-range access by values which // are only just outside the range is cutting it a bit fine and the extra // headroom for even splines makes the code more robust, so accepting the // extra headroom would be just as acceptable as the widened right brace for // some splines which saves checking the incoming coordinate against // the upper limit. The code increasing the headroom is in bspline.h, // in bspline's constructor just befor the call to setup_metrics. template < typename ic_t , typename rc_t > void even_split ( rc_t v , ic_t& iv , rc_t& fv ) { rc_t fl_i = round ( v ) ; fv = v - fl_i ; vspline::assign ( iv , fl_i ) ; } template < typename ic_t , typename rc_t , int N > void even_split ( vigra::TinyVector < rc_t , N > v , vigra::TinyVector < ic_t , N > & iv , vigra::TinyVector < rc_t , N > & fv ) { for ( int d = 0 ; d < N ; d++ ) even_split ( v[d] , iv[d] , fv[d] ) ; } /// bspline_basis_2 yields the value of the b-spline basis function /// at multiples of 1/2, for which vspline has precomputed values. /// Instead of passing real values x which are multiples of 1/2, this /// routine takes the doubled argument, so instead of calling it with /// x = 0.5, you call it with x2 = 1. /// /// this is a helper routine to offer convenient access to vspline's /// precomputed basis function values, and it also handles the special /// case of degree 0, x = -1/2, where the b-spline basis function /// is not symmetric and yields 1. /// /// Inside vspline, this routine is only used by calculate_weight_matrix. /// User code should rarely need it, but I hold on to it as a separate entity. /// The code also handles derivatives. This is done by recursion, which is /// potentially very slow (for high derivatives), so this routine should /// only be used for 'moderate' derivative values. For fast access to the /// basis function's derivatives, use of vspline::basis_functor is /// recommended instead. template < class real_type > real_type bspline_basis_2 ( int x2 , int degree , int derivative = 0 ) { real_type result ; if ( derivative == 0 ) { if ( degree == 0 ) { if ( x2 == -1 || x2 == 0 ) result = real_type ( 1 ) ; else result = real_type ( 0 ) ; } else if ( abs ( x2 ) > degree ) { result = real_type ( 0 ) ; } else { // here we pick the precomputed value: const vspline::xlf_type * pk = vspline_constants::precomputed_basis_function_values [ degree ] ; result = real_type ( pk [ abs ( x2 ) ] ) ; } return result ; } else { // recurse. this will only produce recursion until 'derivative' becomes // zero, at which point precomputed values are picked. --derivative; return bspline_basis_2 ( x2 + 1 , degree - 1 , derivative ) - bspline_basis_2 ( x2 - 1 , degree - 1 , derivative ) ; } } /// a 'weight matrix' is used to calculate a set of weights for a given remainder /// part of a real coordinate. The 'weight matrix' is multipled with a vector /// containing the power series of the given remainder, yielding a set of weights /// to apply to a window of b-spline coefficients. /// /// The routine 'calculate_weight_matrix' originated from vigra, but I rewrote it /// to avoid calculating values of derivatives of the basis function by using /// recursion. The content of the 'weight matrix' is now calculated directly /// with a forward iteration starting from precomputed basis function values, /// the derivatives needed are formed by repeatedly differencing these values. /// The recursive formulation in vigra makes sense, since the degree of the spline /// is a template argument in vigra and the recursive formulation can be evaluated /// at compile-time, allowing for certain optimizations. But in vspline, spline /// degree is a run-time variable, and vspline offers calculation of splines of /// degrees up to (currently) 45, which aren't feasibly calculable by recursion, /// with a recursion calling itself twice for every step: this would take a very /// long time or exceed the system's capacity. Analyzing the recursive implementation, /// it can be seen that it produces a great many redundant calculations, which soon /// exceed reasonable limits. With vigra's maximal spline degree the load is just /// about manageable, but beyond that (degree 19 or so at the time of this writing) /// it's clearly no-go. /// /// The forward iteration is reasonably fast, even for high spline degrees, /// while my previous implementation did slow down very noticeably from, say, /// degree 20, making it unusable for high spline degrees. I really only noticed /// the problem after raising the maximal degree vspline can handle, following the /// rewrite of bootstrap.cc using arbitrary-precision maths. /// /// A weight matrix is also used by 'basis_functor', in a similar way to how /// it's used for weight generation. template < class target_type > void calculate_weight_matrix ( vigra::MultiArray < 2 , target_type > & res ) { int order = res.shape ( 0 ) ; int degree = order - 1 ; int derivative = order - res.shape(1) ; // guard against impossible parameters if ( derivative >= order ) return ; vspline::xlf_type faculty = 1 ; // why xlf_type? because integral types overflow // we do the calculations for each row of the weight matrix in the same type // as the precomputed basis function values, only casting down to 'target_type' // once the row is ready vspline::xlf_type der_line [ degree + 1 ] ; for ( int row = 0 ; row < order - derivative ; row++ ) { if ( row > 1 ) faculty *= row ; // obtain pointers to beginning of row and past it's end vspline::xlf_type * p_first = der_line ; vspline::xlf_type * p_end = der_line + degree + 1 ; // now we want to pick basis function values. The first row to go into // the weight matrix gets basis function values for 'degree', the next // one values for degree-1, but differenced once, the next one values // for degree-2, differenced twice etc. // Picking the values is done so that for odd degrees, basis function // values for whole x are picked, for even spline degrees, values // 1/2 + n, n E N are picked. Why so? When looking at the recursion // used in bspline_basis_2, you can see that each step of the recursion // forms the difference between a value 'to the left' and a value 'to // the right' of the current position (x2-1 and x2+1). If you follow // this pattern, you can see that, depending on the degree, the recursion // will run down to either all even or all odd x2, so this is what we // pick, since the other values will not participate in the result // at all. // Note how, to pick the basis function values, we use bspline_basis_2, // which 'knows' how to get the right precomputed values. Contrary to my // previous implementation, it is *not* used to calculate the derivatives, // it is only used as a convenient way to pick the precomputed values. int m = degree - derivative - row ; if ( m == 0 ) { der_line[0] = 1 ; ++p_first ; } else if ( degree & 1 ) { for ( int x2 = - m + 1 ; x2 <= m - 1 ; x2 += 2 ) { *(p_first++) = bspline_basis_2 ( x2 , m ) ; } } else { for ( int x2 = - m ; x2 <= m ; x2 += 2 ) { *(p_first++) = bspline_basis_2 ( x2 , m ) ; } } // fill the remainder of the line with zeroes vspline::xlf_type * p = p_first ; while ( p < p_end ) *(p++) = 0 ; // now we have the initial basis function values. We need to differentiate // the sequence, possibly several times. We have the initial values // flush with the line's left bound, so we perform the differentiation // back to front, and after the last differentiation the line is full. // Note how this process might be abbreviated further by exploiting // symmetry relations (most rows are symmetric or antisymmetric). // I refrain from doing so (for now) and I suspect that this may even be // preferable in respect to error propagation (TODO: check). for ( int d = m ; d < degree ; d++ ) { // deposit first difference after the last basis function value vspline::xlf_type * put = p_first ; // and form the difference to the value before it. vspline::xlf_type * pick = put - 1 ; // now form differences back to front while ( pick >= der_line ) { *put = *pick - *put ; --put ; --pick ; } // since we have nothing left 'to the left', where another zero // would be, we simply invert the sign (*put = 0 - *put). *put = - *put ; // The next iteration has to start one place further to the right, // since there is now one more value in the line p_first++ ; } // the row is ready and can now be assigned to the corresponding row of // the weight matrix after applying the final division by 'faculty' and, // possibly, downcasting to 'target_type'. // we store to a MultiArray, which is row-major, so storing as we do // places the results in memory in the precise order in which we want to // use them later on in the weight calculation. for ( int k = 0 ; k < degree + 1 ; k++ ) res ( k , row ) = der_line[k] / faculty ; } } /// basis_functor is an object producing the b-spline basis function value /// for given arguments, or optionally a derivative of the basis function. /// While basis_functor can produce single basis function values for single /// arguments, it can also produce a set of basis function values for a /// given 'delta'. This set is a unit-spaced sampling of the basis function /// sampled at n + delta for all n E N. Such samplings are used to evaluate /// b-splines; they constitute the set of weights which have to be applied /// to a set of b-spline coefficients to form the weighted sum which is the /// spline's value at a given position. /// The calculation is done by using a 'weight matrix'. The columns of the /// weight matrix contain the coefficients for the partial polynomials defining /// the basis function for the corresponding interval. In 'general' evaluation, /// all partial polynomials are evaluated. To obtain single basis function /// values, we pick out a single column only. By evaluating the partial /// polynomial for this slot, we obtain a single basis function value. /// /// This functor provides the value(s) in constant time and there is no /// recursion. Setting up the functor costs a bit of time (for calculating /// the 'weight matrix'), evaluating it merely evaluates the partial /// polynomial(s) which is quick by comparison. So this is the way to go if /// basis function values are needed - especially if there is a need for /// several values of a given basis function. I refrain from giving a 'one-shot' /// function using a basis_functor - this is easily achieved by coding /// /// b = basis_functor ( degree , derivative ) ( x ) ; /// /// ... which does the trick, but 'wastes' the weight matrix. /// /// The weight matrix and all variables use 'math_type' which defaults to /// xlf_type, vspline's most exact type. By instantiating with a lesser type, /// The computation can be done more quickly, but less precisely. template < typename math_type = vspline::xlf_type > struct basis_functor { vigra::MultiArray < 2 , math_type > weight_matrix ; int degree ; basis_functor ( int _degree , int _derivative = 0 ) : weight_matrix ( _degree + 1 , _degree + 1 - _derivative ) , degree ( _degree ) { calculate_weight_matrix ( weight_matrix ) ; } ; // we need a default constructor to create TinyVectors of this type basis_functor() : weight_matrix ( 1 , 1 ) , degree ( 0 ) { } ; /// operator() taking a column index and a remainder. If these values /// are known already, the only thing left to do is the evaluation of /// the partial polynomial. Note that this overload is not safe for /// arbitrary x, it's assumed that calling code makes sure no invalid /// arguments are passed - as in the overload below. // TODO might generalize to allow vectorized operation math_type operator() ( int x , math_type delta ) const { math_type result = weight_matrix ( x , 0 ) ; math_type power = 1 ; // remaining rows, if any, refine result for ( int row = 1 ; row < weight_matrix.shape(1) ; row++ ) { power *= delta ; result += power * weight_matrix ( x , row ) ; } return result ; } /// operator() taking an arbitrary argument. This is the overload which /// will likely be called from user code. The argument is clamped and /// split, the split value is fed to the previous overload. /// This routine provides a single result for a single argument and /// is used if the basis function itself needs to be evaluated, which /// doesn't happen much inside vspline. Access to sets of basis function /// values used as weights in b-spline evaluation is coded below. math_type operator() ( math_type rx ) const { int x ; math_type delta ; // we split the argument into an integer and a small real remainder if ( degree & 1 ) vspline::odd_split ( rx , x , delta ) ; else { if ( degree == 0 ) { if ( rx >= -.5 && rx < 0.5 ) return 1 ; return 0 ; } vspline::even_split ( rx , x , delta ) ; } x = degree / 2 - x ; if ( x < 0 || x >= weight_matrix.shape(0) ) { return 0 ; } return operator() ( x , delta ) ; } /// operator() overload to produce a set of weights for a given /// delta in [-.5,.5] or [0,1]. This set of weights is needed if /// a b-spline has to be evaluated at some coordinate k + delta, /// where k is a whole number. For this evaluation, a set of /// coefficients has to be multiplied with a set of weights, and /// the products summed up. So this routine provides the set of /// weights. It deposits weights for the given delta at the location /// 'result' points to. target_type and delta_type may be fundamentals /// or simdized types. template < class target_type , class delta_type > void operator() ( target_type* result , const delta_type & delta ) const { target_type power ( delta ) ; auto factor_it = weight_matrix.begin() ; auto end = weight_matrix.end() ; // the result is initialized with the first row of the 'weight matrix'. // We save ourselves multiplying it with delta^0. for ( int c = 0 ; c <= degree ; c++ ) { result[c] = *factor_it ; ++factor_it ; } if ( degree ) { for ( ; ; ) { for ( int c = 0 ; c <= degree ; c++ ) { result[c] += power * *factor_it ; ++factor_it ; } if ( factor_it == end ) { // avoid next multiplication if exhausted, break now break ; } // otherwise produce next power(s) of delta(s) power *= target_type ( delta ) ; } } } } ; /// Implementation of the Cox-de Boor recursion formula to calculate /// the value of the bspline basis function for arbitrary real x. /// This code was taken from vigra but modified to take the spline degree /// as a parameter. Since this routine uses recursion, it's usefulness /// is limited to smaller degrees. /// /// This routine operates in real and calculates the basis function value /// for arbitrary real x, but it suffers from cumulating errors, especially /// when the recursion is deep, so the results are not uniformly precise. /// /// This code is expensive for higher spline orders because the routine /// calls itself twice recursively, so the performance is N*N with the /// spline's degree. Luckily there are ways around using this routine at all /// - whenever we need the b-spline basis function value in vspline, it is at /// multiples of 1/2, and poles.h has precomputed values for all spline /// degrees covered by vspline. The value of the basis function itself /// can be obtained by using a vspline::basis_functor, which performs in /// fixed time and is set up quickly. /// /// I leave this code in here for reference purposes - it's good to have /// another route to the basis function values, see self_test.cc. template < class real_type > real_type cdb_bspline_basis ( real_type x , int degree , int derivative = 0 ) { if ( degree == 0 ) { if ( derivative == 0 ) return ( x < real_type(0.5) && real_type(-0.5) <= x ) ? real_type(1.0) : real_type(0.0) ; else return real_type(0.0); } if ( derivative == 0 ) { real_type n12 = real_type((degree + 1.0) / 2.0); return ( ( n12 + x ) * cdb_bspline_basis ( x + real_type(0.5) , degree - 1 , 0 ) + ( n12 - x ) * cdb_bspline_basis ( x - real_type(0.5) , degree - 1 , 0 ) ) / degree; } else { --derivative; return cdb_bspline_basis ( x + real_type(0.5) , degree - 1 , derivative ) - cdb_bspline_basis ( x - real_type(0.5) , degree - 1 , derivative ) ; } } /// Gaussian approximation to B-spline basis function. This routine /// approximates the basis function of degree spline_degree for real x. /// I checked for all degrees up to 45. The partition of unity quality of the /// resulting reconstruction filter is okay for larger degrees, the cumulated /// error over the covered interval is quite low. Still, as the basis function /// is never actually evaluated in vspline (whenever it's needed, it is needed /// at n * 1/2 and we have precomputed values for that) there is not much point /// in having this function around. I leave the code in for now. template < typename real_type > real_type gaussian_bspline_basis_approximation ( real_type x , int degree ) { real_type sigma = ( degree + 1 ) / 12.0 ; return real_type(1.0) / sqrt ( real_type(2.0 * M_PI) * sigma ) * exp ( - ( x * x ) / ( real_type(2.0) * sigma ) ) ; } } ; // end of namespace vspline #endif // #define VSPLINE_BASIS_H kfj-vspline-4b365417c271/bootstrap.cc000066400000000000000000000510631333775006700172440ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file bootstrap.cc /// /// \brief Code to calculate b-spline basis function values at half /// unit steps, and prefilter poles. /// /// These calculations are done using GNU GSL, BLAS and GNU GMP /// which are not used in other parts of vspline. this file also has /// code to verify that the precomputed data in /// are valid. TODO make that a separate program /// /// In contrast to my previous implementation of generating the /// precomputed values (prefilter_poles.cc, now gone), I don't /// compute the values one-by-one using recursive functions, but /// instead build them layer by layer, each layer using it's /// predecessor. This makes the process very fast, even for /// large spline degrees. /// /// In my tests, I found that with the very precise basis function /// values obtained with GMP, I could get GSL to provide raw values /// for the prefilter poles up to degree 45, above which it would /// miss roots. So this is where I set the limit, and I think it's /// unlikely anyone would ever want more than degree 45. If they /// do, they'll have to find other polynomial root code. /// /// compile: g++ -O3 -std=gnu++11 bootstrap.cc -obootstrap \ /// -lgmpxx -lgmp -lgsl -lblas -pthread /// /// run by simply issuing './bootstrap' on the command line. you may /// want to redirect output to a file. The output should be equal /// to the values in vspline/poles.h #include #include #include #include #include #include #include #include #include // #include #include // In my trials, I could take the spline degree up to 45 when calculating // the prefilter poles. The basis function value can be found for arbitrary // degrees. #define degree_limit 46 // when using mpf_class from GMP, we use 512 bits of precision. #define mpf_bits 512 // we want to build a table of basis function values at multiples // of 1/2 mimicking the Cox-de Boor recursion. We only need one half // of the table since the basis function is symmetric (b(x) == b(-x)) // - except for degree 0, which needs special treatment. // Note that we 'invert' the Cox-de Boor recursion and generate // the result iteratively, starting with degree 0. This results in // every single difference relation being executed excactly once, // which makes the iterative solution very fast. // Note the use of x2 throughout, this is the doubled argument, // so that we can operate in integral maths throughout the basis // function calculations (we use mpq_class for arbitrary-sized // fractions, so there is no error at all) // We make the table one column wider than strictly necessary to // allow access to x2 outside the range, in which case we want to // access a location containing 0. mpq_class basis_table [ degree_limit ] [ degree_limit + 1 ] ; // we want to access the table both for reading and writing, mapping // negative x2 to positive ones and treating access to the first row // (for degree 0) specially. Once the table is filled, we can access // the table at arbitrary x2. mpq_class & access_table ( int degree , int x2 ) { if ( degree >= degree_limit || degree < 0 ) throw std::invalid_argument ( "access with degree out of bounds" ) ; int column ; // at degree 0, we have no symmetry: the basis function is // 1 for x2 == -1 and x2 == 0. if ( degree == 0 ) { if ( x2 == -1 || x2 == 0 ) column = 0 ; else column = 1 ; } // for all other degrees the basis function is symmetric else column = x2 >= 0 ? x2 : -x2 ; // for out-of-range x2, land in the last column, which is 0. column = std::min ( column , degree_limit ) ; return basis_table [ degree ] [ column ] ; } // subroutine filling the table at a specific x2 and degree. // we rely on the table being filled correctly up to degree - 1 void _fill_basis_table ( int x2 , int degree ) { // look up the neighbours one degree down and one to the // left/right mpq_class & f1 ( access_table ( degree - 1 , x2 + 1 ) ) ; mpq_class & f2 ( access_table ( degree - 1 , x2 - 1 ) ) ; // use the recursion to calculate the current value and store // that value at the appropriate location in the table access_table ( degree , x2 ) = ( f1 * ( degree + 1 + x2 ) + f2 * ( degree + 1 - x2 ) ) / ( 2 * degree ) ; } // to fill the whole table, we initialize the single value // in the first row, then fill the remainder by calling the // single-value subroutine for all non-zero locations void fill_basis_table() { access_table ( 0 , 0 ) = 1 ; for ( int degree = 1 ; degree < degree_limit ; degree++ ) { for ( int x2 = 0 ; x2 <= degree ; x2++ ) _fill_basis_table ( x2 , degree ) ; } } // routine for one iteration of Newton's method applied to // a polynomial with ncoeff coefficients stored at pcoeff. // the polynomial and it's first derivative are computed // at x, then the quotient of function value and derivative // is subtracted from x, yielding the result, which is // closer to the zero we're looking for. template < typename dtype > void newton ( dtype & result , dtype x , int ncoeff , dtype * pcoeff ) { dtype * pk = pcoeff + ncoeff - 1 ; // point to last pcoefficient dtype power = 1 ; // powers of x int xd = 1 ; // ex derivative, n in (x^n)' = n(x^(n-1)) dtype fx = 0 ; // f(x) dtype dfx = 0 ; // f'(x) // we work our way back to front, starting with x^0 and raising // the power with each iteration fx += power * *pk ; for ( ; ; ) { --pk ; if ( pk == pcoeff ) break ; dfx += power * xd * *pk ; xd++ ; power *= x ; fx += power * *pk ; } power *= x ; fx += power * *pk ; result = x - fx / dfx ; } // calculate_prefilter_poles relies on the table with basis function // values having been filled with fill_basis_table(). It extracts // the basis function values at even x2 (corresponding to whole x) // as double precision values, takes these to be coefficients of // a polynomial, and uses gsl and blas to obtain the roots of this // polynomial. The real parts of those roots which are less than one // are the raw values of the filter poles we're after. // Currently I don't have an arbitrary-precision root finder, so I // use the double precision one from GSL and submit the 'raw' result // to polishing code in high precision arithmetic. This only works // as far as the root finder can go, I found from degree 46 onwards // it fails to find some roots, so that's how far I take it for now. // The poles are stored in a std::vector of mpf_class and returned. // These values are precise to mpf_bits, since after using gsl's // root finder in double precision, they are polished as mpf_class // values with the newton method, so the very many postcomma digits // we print out below are all proper significant digits. void calculate_prefilter_poles ( std::vector &res , int degree ) { const int r = degree / 2 ; // we need storage for the coefficients of the polynomial // first in double precision to feed gsl's root finder, // then in mpf_class, to do the polishing double coeffs [ 2 * r + 1 ] ; mpf_class mpf_coeffs [ 2 * r + 1 ] ; // here is space for the roots we want to compute double roots [ 4 * r + 2 ] ; // we fetch the coefficients from the table of b-spline basis // function values at even x2, corresponding to whole x double * pcoeffs = coeffs ; mpf_class * pmpf_coeffs = mpf_coeffs ; for ( int x2 = -r * 2 ; x2 <= r * 2 ; x2 += 2 , pcoeffs++ , pmpf_coeffs++ ) { *pmpf_coeffs = access_table ( degree , x2 ) ; *pcoeffs = access_table ( degree , x2 ) . get_d() ; } // we set up the environment gsl needs to find the roots gsl_poly_complex_workspace * w = gsl_poly_complex_workspace_alloc ( 2 * r + 1 ) ; // now we call gsl's root finder gsl_poly_complex_solve ( coeffs , 2 * r + 1 , w , roots ) ; // and release it's workspace gsl_poly_complex_workspace_free ( w ) ; // we only look at the real parts of the roots, which are stored // interleaved real/imag. And we take them back to front, even though // it doesn't matter which end we start with - but conventionally // Pole[0] is the root with the largest absolute, so I stick with that. // I tried using eigen alternatively, but it found less roots // than gsl/blas, so I stick with the GNU code, but for the reference, // here is the code to use with eigen3 - for degrees up to 23 it was okay // when I last tested. TODO: maybe the problem is that I did not get // long double values to start with, which is tricky from GMP // { // using namespace Eigen ; // // Eigen::PolynomialSolver solver; // Eigen::Matrix coeff(2*r+1); // // for ( int i = 0 ; i < 2*r+1 ; i++ ) // { // // this is stupid: can't get a long double for an mpq_type // // TODO take the long route via a string (sigh...) // coeff[i] = mpf_coeffs[i].get_d() ; // } // // solver.compute(coeff); // // const Eigen::PolynomialSolver::RootsType & r // = solver.roots(); // // for ( int i = r.rows() - 1 ; i >= 0 ; i-- ) // { // if ( std::fabs ( r[i].real() ) < 1 ) // { // std::cout << "eigen gets " << r[i].real() << std::endl ; // } // } // } for ( int i = 2 * r - 2 ; i >= 0 ; i -= 2 ) { if ( std::abs ( roots[i] ) < 1.0 ) { // fetch a double precision root from gsl's output // converting to high-precision mpf_class root ( roots[i] ) ; // for the polishing process, we need a few more // high-precision values mpf_class pa , // current value of the iteration pb , // previous value pdelta ; // delta we want to go below pa = root , pb = 0 ; pdelta = "1e-300" ; // while we're not yet below delta while ( ( pa - pb ) * ( pa - pb ) >= pdelta ) { // assign current to previous pb = pa ; // polish current value newton ( pa , pa , 2*r+1 , mpf_coeffs ) ; } // polishing iteration terminated, we're content and store the value root = pa ; res.push_back ( root ) ; } } } // forward recursive filter, used for testing void forward ( mpf_class * x , int size , mpf_class pole ) { mpf_class X = 0 ; for ( int n = 1 ; n < size ; n++ ) { x[n] = X = x[n] + pole * X ; } } // backward recursive filter, used for testing void backward ( mpf_class * x , int size , mpf_class pole ) { mpf_class X = 0 ; for ( int n = size - 2 ; n >= 0 ; n-- ) { x[n] = X = pole * ( X - x[n] ); } } // to test the poles, we place the reconstruction kernel - a unit-spaced // sampling of the basis function at integral x - into the center of a large // buffer which is otherwise filled with zeroes. Then we apply all poles // in sequence with a forward and a backward run of the prefilter. // Afterwards, we expect to find a unit pulse in the center of the buffer. // Since we use very high precision (see mpf_bits) we can set a conservative // limit for the maximum error, here I use 1e-150. If this test throws up // warnings, there might be something wrong with the code. void test ( int size , const std::vector & poles , int degree ) { mpf_class buffer [ size ] ; for ( int k = 0 ; k < size ; k++ ) { buffer[k] = 0 ; } // calculate overall gain of the filter mpf_class lambda = 1 ; for ( int k = 0 ; k < poles.size() ; k++ ) lambda = lambda * ( 1 - poles[k] ) * ( 1 - 1 / poles[k] ) ; int center = size / 2 ; // put basis function samples into the buffer, applying gain for ( int x2 = 0 ; x2 <= degree ; x2 += 2 ) buffer [ center - x2/2 ] = buffer [ center + x2/2 ] = lambda * access_table ( degree , x2 ) ; // filter for ( int k = 0 ; k < poles.size() ; k++ ) { forward ( buffer , size , poles[k] ) ; backward ( buffer , size , poles[k] ) ; } // test mpf_class error = 0 ; mpf_class max_error = 0 ; for ( int x2 = 0 ; x2 <= degree ; x2 += 2 ) { error = abs ( buffer [ center - x2/2 ] - ( x2 == 0 ? 1 : 0 ) ) ; if ( error > max_error ) max_error = error ; error = abs ( buffer [ center + x2/2 ] - ( x2 == 0 ? 1 : 0 ) ) ; if ( error > max_error ) max_error = error ; } mpf_class limit = 1e-150 ; if ( max_error > limit ) std::cerr << "WARNING: degree " << degree << " error exceeds limit 1e-150: " << max_error << std::endl ; } // we want to do the test in long double as well, to see how much // downcasting the data to long double affects precision: vspline::xlf_type get_xlf ( const mpf_class & _x ) { long exp ; std::string sign ( "+" ) ; mpf_class x ( _x ) ; if ( x < 0 ) { x = -x ; sign = std::string ( "-" ) ; } auto str = x.get_str ( exp , 10 , 64 ) ; std::string res = sign + std::string ( "." ) + str + std::string ( "e" ) + std::to_string ( exp ) + std::string ( "l" ) ; vspline::xlf_type ld = std::stold ( res ) ; return ld ; } ; // forward recursive filter, used for testing void ldforward ( vspline::xlf_type * x , int size , vspline::xlf_type pole ) { vspline::xlf_type X = 0 ; for ( int n = 1 ; n < size ; n++ ) { x[n] = X = x[n] + pole * X ; } } // backward recursive filter, used for testing void ldbackward ( vspline::xlf_type * x , int size , vspline::xlf_type pole ) { vspline::xlf_type X = 0 ; for ( int n = size - 2 ; n >= 0 ; n-- ) { x[n] = X = pole * ( X - x[n] ); } } // test with values cast down to long double void ldtest ( int size , const std::vector & poles , int degree ) { int nbpoles = degree / 2 ; vspline::xlf_type buffer [ size ] ; for ( int k = 0 ; k < size ; k++ ) { buffer[k] = 0 ; } // calculate overall gain of the filter vspline::xlf_type lambda = 1 ; for ( int k = 0 ; k < nbpoles ; k++ ) { auto ldp = get_xlf ( poles[k] ) ; lambda = lambda * ( 1 - ldp ) * ( 1 - 1 / ldp ) ; } int center = size / 2 ; // put basis function samples into the buffer, applying gain for ( int x2 = 0 ; x2 <= degree ; x2 += 2 ) { vspline::xlf_type bf = get_xlf ( access_table ( degree , x2 ) ) ; buffer [ center - x2/2 ] = buffer [ center + x2/2 ] = lambda * bf ; } // filter for ( int k = 0 ; k < nbpoles ; k++ ) { auto ldp = get_xlf ( poles[k] ) ; ldforward ( buffer , size , ldp ) ; ldbackward ( buffer , size , ldp ) ; } // test vspline::xlf_type error = 0 ; vspline::xlf_type max_error = 0 ; for ( int x2 = 0 ; x2 <= degree ; x2 += 2 ) { error = std::abs ( buffer [ center - x2/2 ] - ( x2 == 0 ? 1 : 0 ) ) ; if ( error > max_error ) max_error = error ; error = std::abs ( buffer [ center + x2/2 ] - ( x2 == 0 ? 1 : 0 ) ) ; if ( error > max_error ) max_error = error ; } vspline::xlf_type limit = 1e-12 ; if ( max_error > limit ) std::cerr << "WARNING: degree " << degree << " ld error exceeds limit 1e-12: " << max_error << std::endl ; } #include int main ( int argc , char * argv[] ) { mpf_set_default_prec ( mpf_bits ) ; bool print_basis_function = true ; bool print_prefilter_poles = true ; bool print_umbrella_structures = true ; std::cout << std::setprecision ( 64 ) ; fill_basis_table() ; mpf_class value ; if ( print_basis_function ) { for ( int degree = 0 ; degree < degree_limit ; degree++ ) { std::cout << "const vspline::xlf_type K" << degree << "[] = {" << std::endl ; for ( int x2 = 0 ; x2 <= degree ; x2++ ) { value = access_table ( degree , x2 ) ; std::cout << " XLF(" << value << ") ," << std::endl ; get_xlf ( value ) ; } std::cout << "} ;" << std::endl ; } } for ( int degree = 2 ; degree < degree_limit ; degree++ ) { std::vector res ; calculate_prefilter_poles ( res , degree ) ; if ( res.size() != degree / 2 ) { std::cerr << "not enough poles for degree " << degree << std::endl ; continue ; } if ( print_prefilter_poles ) { std::cout << "const vspline::xlf_type Poles_" << degree << "[] = {" << std::endl ; for ( int p = 0 ; p < degree / 2 ; p++ ) { value = res[p] ; std::cout << " XLF(" << value << ") ," << std::endl ; } std::cout << "} ;" << std::endl ; } // test if the roots are good test ( 10000 , res , degree ) ; // optional, to see what happens when data are cast down to long double ldtest ( 10000 , res , degree ) ; } if ( print_umbrella_structures ) { std::cout << std::noshowpos ; std::cout << "const vspline::xlf_type* const precomputed_poles[] = {" << std::endl ; std::cout << " 0, " << std::endl ; std::cout << " 0, " << std::endl ; for ( int i = 2 ; i < degree_limit ; i++ ) std::cout << " Poles_" << i << ", " << std::endl ; std::cout << "} ;" << std::endl ; std::cout << "const vspline::xlf_type* const precomputed_basis_function_values[] = {" << std::endl ; for ( int i = 0 ; i < degree_limit ; i++ ) std::cout << " K" << i << ", " << std::endl ; std::cout << "} ;" << std::endl ; } } kfj-vspline-4b365417c271/brace.h000066400000000000000000000365221333775006700161500ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file brace.h \brief This file provides code for 'bracing' a b-spline's coefficient array. Note that this isn't really user code, it's code used by class vspline::bspline. Inspired by libeinspline, I wrote code to 'brace' the spline coefficients. The concept is this: while the IIR filter used to calculate the coefficients has infinite support (though arithmetic precision limits this in real-world applications), the evaluation of the spline at a specific location only looks at a small window of coefficients (compact, finite support). This fact can be exploited by taking note of how large the support area is and providing a few more coefficients in a frame around the 'core' coefficients to allow the evaluation to proceed without having to check for boundary conditions. While the difference is not excessive (the main computational cost is the actual evaluation itself), it's still nice to be able to code the evaluation without boundary checking, which makes the code very straightforward and legible. There is another aspect to bracing: In my implementation of vectorized evaluation, the window into the coefficient array used to pick out coefficients to evaluate at a specific location is coded as a set of offsets from it's center. This way, several such windows can be processed in parallel. This mechanism can only function efficiently in a braced coefficient array, since it would otherwise have to give up if any of the windows accessed by the vector of coordinates had members outside the (unbraced) coefficient array and submit the coordinate vector to individual processing. I consider the logic to code this and the loss in performance too much of a bother to go down this path; all my evaluation code uses braced coefficient arrays. Of course the user is free to omit bracing, but then they have to use their own evaluation code. What's in the brace? Of course this depends on the boundary conditions chosen. In vspline, I offer code for several boundary conditions, but most have something in common: the original, finite sequence is extrapolated into an infinite periodic signal. With straight PERIODIC boundary conditions, the initial sequence is immediately followed and preceded by copies of itself. The other boundary conditions mirror the signal in some way and then repeat the mirrored signal periodically. Using boundary conditions like these, both the extrapolated signal and the coefficients share the same periodicity and mirroring. There is one exception: 'natural' boundary conditions use point-mirroring on the bounds. With this extrapolation, the extrapolated value can *not* be obtained by a coordinate manipulation. The method of bracing the spline does still function, though. So with bracing, we can provide b-splines with 'natural' boundary conditions as well, but here we are limited to evaluation inside the spline's defined range. There are two ways of arriving at a coeffcient array: We can start from the extrapolated signal, pick a section large enough to make margin effects vanish (due to limited arithmetic precision), prefilter it and pick out a subsection containing the 'core' coefficients and their support. Alternatively, we can work only on the core coefficients, calculate suitable initial causal and anticausal coeffcients (where the calculation considers the extrapolated signal, which remains implicit), and apply the filter with these known initial coefficients. vspline uses the latter approach. Once the 'core' coefficients are known, the brace is filled. The bracing can be performed without any solver-related maths by simply copying (possibly trivially modified) slices of the core coefficients to the margin area. Since the bracing mainly requires copying data or trivial maths we can do the operations on higher-dimensional objects, like slices of a volume. To efficiently code these operations we make use of vigra's multi-math facility and it's bindAt array method, which makes these subarrays easily available. */ // TODO: while this is convenient, it's not too fast, as it's neither multithreaded nor // vectorized. Still in most 'normal' scenarios the execution time is negligible, and since // the code is trivial, it autovectorizes well. // // TODO: there are 'pathological' cases where one brace is larger than the other brace // and the width of the core together. These cases can't be handled for all bracing modes // and will result in an exception. #ifndef VSPLINE_BRACE_H #define VSPLINE_BRACE_H #include #include #include "common.h" namespace vspline { /// class bracer encodes the entire bracing process. Note that contrary /// to my initial implementation, class bracer is now used exclusively /// for populating the frame around a core area of data. It has no code /// to determine which size a brace/frame should have. This is now /// determined in class bspline, see especially class bspline's methods /// get_left_brace_size(), get_right_brace_size() and setup_metrics(). // KFJ 2018-03-07 cosmetics: changed the template argument list to take // template arguments to the MultiArrayView instead of the view's type // itself, to narrow the range of acceptable input to MultiArrayViews // or types which can be converted to them, and make sure that inside // class bracer we only work on MultiArrayViews. template struct bracer { typedef _value_type value_type ; enum { dimension = _dimension } ; typedef vigra::MultiArrayView < dimension , value_type > view_type ; typedef typename view_type::difference_type shape_type ; /// apply the bracing to the array, performing the required copy/arithmetic operations /// to the 'frame' around the core. This routine performs the operation along axis 'dim'. /// This is also the routine to be used for explicitly extrapolating a signal: /// you place the data into the center of a larger array, and pass in the sizes of the /// 'empty' space which is to be filled with the extrapolated data. /// /// the bracing is done one-left-one-right, to avoid corner cases as best as posible. /// This makes it possible to have signals which are shorter than the brace and still /// produce a correct brace for them. static void apply ( view_type & a , // containing array bc_code bc , // boundary condition code int lsz , // space to the left which needs to be filled int rsz , // ditto, to the right int axis ) // axis along which to apply bracing { int w = a.shape ( axis ) ; // width of containing array along axis 'axis' int m = w - ( lsz + rsz ) ; // width of 'core' array if ( m < 1 ) // has to be at least 1 throw shape_mismatch ( "combined brace sizes must be at least one less than container size" ) ; if ( m == 1 ) { // KFJ 2018-02-10 // special case: the core has only shape 1 in this direction. // if this is so, we fill the brace with the single slice in the // middle, no matter what the boundary condition code may say. for ( int s = 0 ; s < w ; s++ ) { if ( s != lsz ) a.bindAt ( axis , s ) = a.bindAt ( axis , lsz ) ; } return ; } if ( ( lsz > m + rsz ) || ( rsz > m + lsz ) ) { // not enough data to fill brace if ( bc == PERIODIC || bc == NATURAL || bc == MIRROR || bc == REFLECT ) throw std::out_of_range ( "each brace must be smaller than the sum of it's opposite brace and the core's width" ) ; } int l0 = lsz - 1 ; // index of innermost empty slice on the left; like begin() int r0 = lsz + m ; // ditto, on the right int lp = l0 + 1 ; // index of leftmost occupied slice (p for pivot) int rp = r0 - 1 ; // index of rightmost occupied slice int l1 = -1 ; // index one before outermost empty slice to the left int r1 = w ; // index one after outermost empty slice on the right; like end() int lt = l0 ; // index to left target slice int rt = r0 ; // index to right target slice ; int ls , rs ; // indices to left and right source slice, will be set below int ds = 1 ; // step for source index, +1 == forẃard, used for all mirroring modes // for periodic bracing, it's set to -1. switch ( bc ) { case PERIODIC : { ls = l0 + m ; rs = r0 - m ; ds = -1 ; // step through source in reverse direction break ; } case NATURAL : case MIRROR : { ls = l0 + 2 ; rs = r0 - 2 ; break ; } case CONSTANT : case REFLECT : { ls = l0 + 1 ; rs = r0 - 1 ; break ; } case ZEROPAD : { break ; } default: { throw not_supported ( "boundary condition not supported by vspline::bracer" ) ; break ; } } for ( int i = max ( lsz , rsz ) ; i > 0 ; --i ) { if ( lt > l1 ) { switch ( bc ) { case PERIODIC : case MIRROR : case REFLECT : { // with these three bracing modes, we simply copy from source to target a.bindAt ( axis , lt ) = a.bindAt ( axis , ls ) ; break ; } case NATURAL : { // here, we subtract the source slice from twice the 'pivot' // easiest would be: // a.bindAt ( axis , lt ) = a.bindAt ( axis , lp ) * value_type(2) - a.bindAt ( axis , ls ) ; // but this fails in 1D TODO: why? auto target = a.bindAt ( axis , lt ) ; // get a view to left target slice target = a.bindAt ( axis , lp ) ; // assign value of left pivot slice target *= value_type(2) ; // double that target -= a.bindAt ( axis , ls ) ; // subtract left source slice break ; } case CONSTANT : { // here, we repeat the 'pivot' slice a.bindAt ( axis , lt ) = a.bindAt ( axis , lp ) ; break ; } case ZEROPAD : { // fill with 0 a.bindAt ( axis , lt ) = value_type() ; break ; } default : // default: leave untouched break ; } --lt ; ls += ds ; } if ( rt < r1 ) { // essentially the same, but with rs instead of ls, etc. switch ( bc ) { case PERIODIC : case MIRROR : case REFLECT : { // with these three bracing modes, we simply copy from source to target a.bindAt ( axis , rt ) = a.bindAt ( axis , rs ) ; break ; } case NATURAL : { // here, we subtract the source slice from twice the 'pivot' // the easiest would be: // a.bindAt ( axis , rt ) = a.bindAt ( axis , rp ) * value_type(2) - a.bindAt ( axis , rs ) ; // but this fails in 1D TODO: why? auto target = a.bindAt ( axis , rt ) ; // get a view to right targte slice target = a.bindAt ( axis , rp ) ; // assign value of pivot slice target *= value_type(2) ; // double that target -= a.bindAt ( axis , rs ) ; // subtract source slice break ; } case CONSTANT : { // here, we repeat the 'pivot' slice a.bindAt ( axis , rt ) = a.bindAt ( axis , rp ) ; break ; } case ZEROPAD : { // fill with 0 a.bindAt ( axis , rt ) = value_type() ; break ; } default : // default: leave untouched break ; } ++rt ; rs -= ds ; } } } /// This overload of 'apply' braces along all axes in one go. static void apply ( view_type& a , // target array, containing the core and (empty) frame vigra::TinyVector < bc_code , dimension > bcv , // boundary condition codes shape_type left_corner , // sizes of left braces shape_type right_corner ) // sizes of right braces { for ( int dim = 0 ; dim < dimension ; dim++ ) apply ( a , bcv[dim] , left_corner[dim] , right_corner[dim] , dim ) ; } } ; } ; // end of namespace vspline #endif // VSPLINE_BRACE_H kfj-vspline-4b365417c271/bspline.h000066400000000000000000001164671333775006700165370ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file bspline.h \brief defines class bspline class bspline is an object to contain a b-spline's coefficients and some metadata in one handy package. It also provides easy access to b-spline prefiltering. The idea is that user code establishes a bspline object representing the data at hand and then proceeds to create 'evaluators' to evaluate the spline. You may be reminded of SciPy's bisplrep object, and I admit that SciPy's bspline code has been one of my inspirations. It attempts to do 'the right thing' by automatically creating suitable helper objects and parametrization so that the spline does what it's supposed to do. Most users will not need anything else, and using class bspline is quite straightforward. It's quite possible to have a b-spline up and running with a few lines of code without even having to make choices concerning it's parametrization, since there are sensible defaults for everything. At the same time, pretty much everything *can* be parametrized. Note that class bspline does not provide evaluation of the spline. To evaluate, objects of class evaluator (see eval.h) are used, which construct from a bspline object and additional parameters, like, whether to calculate the spline's value or it's derivative(s) or whether to use optimizations for special cases. While using 'raw' coefficient arrays with an evaluation scheme which applies boundary conditions is feasible and most memory-efficient, it's not so well suited for very fast evaluation, since the boundary treatment needs conditionals, and 'breaks' uniform access, which is especially detrimental when using vectorization. So vspline uses coefficient arrays with a few extra coefficients 'framing' or 'bracing' the 'core' coefficients. Since evaluation of the spline looks at a certain small section of coefficients (the evaluator's 'support'), the bracing is chosen so that this lookup will always succeed without having to consider boundary conditions: the brace is set up to make the boundary conditions explicit, and the evaluation can proceed blindly without bounds checking. With large coefficient arrays, the little extra space needed for the brace becomes negligible, but the code for evaluation becomes faster and simpler. In effect, 'bracing' is taken a little bit further than merely providing enough extra coefficients to cover the support: additional coefficients are produced to allow for the spline to be evaluated without bounds checking - at the lower and upper limit of the spline's defined range - and even slightly beyond those limits: a safeguard against quantization errors This makes the code more robust: being very strict about the ends of the defined range can easily result in quantization errors producing out-of-bounds access to the coefficient array, so the slightly wider brace acts as a safeguard. While the 'brace' has a specific size which depends on the parameters (see get_left_brace_size() and get_right_brace_size()) - there may be even more additional coefficients if this is needed (see parameter 'headroom'). All additional coefficients around the core form the spline's 'frame'. So the frame is always at least as large as the brace. class bspline handles two views to the coefficients it operates on, these are realized as vigra::MultiArrayViews, and they share the same storage: - the 'container' is a view to all coefficients held by the spline, including the extra coefficients in it's 'frame'. - the 'core', is a view to a subarray of the container with precisely the same shape as the knot point data over which the spline is calculated. The coefficients in the core correspond 1:1 with the knot point data. Probably the most common scenario is that the source data for the spline are available from someplace like a file. Instead of reading the file's contents into memory first and passing the memory to class bspline, there is a more efficient way: a bspline object is set up first, with the specification of the size of the incoming data and the intended mode of operation. The bspline object allocates the memory it will need for the purpose, but doesn't do anything else. The 'empty' bspline object is then 'filled' by the user by putting data into it's 'core' area. Subsequently, prefilter() is called, which converts the data to b-spline coefficients. This way, only one block of memory is used throughout, the initial data are overwritten by the coefficients, operation is in-place and most efficient. If this pattern can't be followed, there are alternatives: - if data are passed to prefilter(), they will be taken as containing the knot point data, rather than expecting the knot point data to be in the bspline oject's memory already. This can also be used to reuse a bspline object with new data. The data passed in will not be modified. - if a view to an array at least the size of the container array is passed into bspline's constructor, this view is 'adopted' and all operations will use the data it refers to. The caller is responsible for keeping these data alive while the bspline object exists, and relinquishes control over the data, which may be changed by the bspline object. */ #ifndef VSPLINE_BSPLINE_H #define VSPLINE_BSPLINE_H #include #include "prefilter.h" #include "brace.h" namespace vspline { /// struct bspline is the object in vspline holding b-spline coefficients. /// In a way, the b-spline 'is' it's coefficients, since it is totally /// determined by them - while, of course, the 'actual' spline is an /// n-dimensional curve. So, even if this is a bit sloppy, I often refer /// to the coefficients as 'the spline', and have named struct bspline /// so even if it just holds the coefficients. /// /// The coefficients held in a bspline object are 'braced', providing a few /// extra extrapolated coefficients around a 'core' area corresponding with /// the knot point, or original data. This way, they can be used by vspline's /// evaluation code which relies on such a brace being present. /// /// struct bspline is a convenience class which bundles a coefficient array /// (and it's creation) with a set of metadata describing the parameters used /// to create the coefficients and the resulting data. I have chosen to implement /// class bspline so that there is only a minimal set of template arguments, /// namely the spline's data type (like pixels etc.) and it's dimension. All /// other parameters relevant to the spline's creation are passed in at /// construction time. This way, if explicit specialization becomes necessary /// (like, to interface to code which can't use templates) the number of /// specializations remains manageable. This design decision pertains specifically /// to the spline's degree, which could also be implemented as a template argument, /// allowing for some optimization by making some members static. Yet going down /// this path requires explicit specialization for every spline degree used and /// the performance gain I found doing so was hardly measurable, while automatic /// testing became difficult and compilation times grew. /// /// class bspline may or may not 'own' the coefficient data it refers to - this /// depends on the specific initialization used, but is handled privately by /// class b-spline, using a shared_ptr to the data if they are owned, which /// makes bspline objects trivially copyable. template < class _value_type , unsigned int _dimension > struct bspline { typedef _value_type value_type ; enum { dimension = _dimension } ; /// if the coefficients are owned, an array of this type holds the data typedef vigra::MultiArray < dimension , value_type > array_type ; /// data are read and written to vigra MultiArrayViews typedef vigra::MultiArrayView < dimension , value_type > view_type ; /// multidimensional index type typedef typename view_type::difference_type shape_type ; /// nD type for one boundary condition per axis typedef typename vigra::TinyVector < bc_code , dimension > bcv_type ; /// elementary type of value_type, like float or double typedef typename ExpandElementResult < value_type >::type ele_type ; /// number of channels in value_type enum { channels = ExpandElementResult < value_type >::size } ; /// the type of 'this' bspline and of a 'channel view' typedef bspline < value_type , dimension > spline_type ; typedef bspline < ele_type , dimension > channel_view_type ; private: // _p_coeffs points to a vigra::MultiArray, which is either default-initialized // and contains no data, or holds data which are viewed by 'container'. Using a // std::shared_ptr here has the pleasant side-effect that class bspline objects // can use the default copy and assignment operators. std::shared_ptr < array_type > _p_coeffs ; public: const bcv_type bcv ; // boundary conditions, see common.h int spline_degree ; // degree of the spline (3 == cubic spline) xlf_type tolerance ; // acceptable error view_type container ; // view to coefficient container array (incl. frame) view_type core ; // view to the core part of the coefficient array shape_type left_frame ; // total width(s) of the left handside frame shape_type right_frame ; // total width(s) of the right handside frame /// lower_limit returns the lower bound of the spline's defined range. /// This is usually 0.0, but with REFLECT boundary condition it's -0.5, /// the lower point of reflection. The lowest coordinate at which the /// spline can be accessed may be lower: even splines have wider support, /// and splines with extra headroom add even more room to manoevre. // TODO might introduce code to provide the 'technical limits' xlf_type lower_limit ( const int & axis ) const { xlf_type limit = 0.0L ; if ( bcv [ axis ] == vspline::REFLECT ) limit = -0.5L ; return limit ; } vigra::TinyVector < xlf_type , dimension > lower_limit() const { vigra::TinyVector < xlf_type , dimension > limit ; for ( int d = 0 ; d < dimension ; d++ ) limit[d] = lower_limit ( d ) ; return limit ; } /// upper_limit returns the upper bound of the spline's defined range. /// This is normally M - 1 if the shape for this axis is M. Splines with /// REFLECT boundary condition use M - 0.5, the upper point of reflection, /// and periodic splines use M. The highest coordinate at which the spline /// may be accessed safely may be higher. xlf_type upper_limit ( const int & axis ) const { xlf_type limit = core.shape ( axis ) - 1 ; if ( bcv [ axis ] == vspline::REFLECT ) limit += 0.5L ; else if ( bcv [ axis ] == vspline::PERIODIC ) limit += 1.0L ; return limit ; } vigra::TinyVector < xlf_type , dimension > upper_limit() const { vigra::TinyVector < xlf_type , dimension > limit ; for ( int d = 0 ; d < dimension ; d++ ) limit[d] = upper_limit ( d ) ; return limit ; } /// get_left_brace_size and get_right_brace_size calculate the size of /// the brace vspline puts around the 'core' coefficients to allow evaluation /// inside the defined range (and even slightly beyond) without bounds /// checking. These routines are static to allow user code to establish /// vspline's bracing requirements without creating a bspline object. /// user code might use this information to generate coefficient arrays /// suitable for use with vspline evaluation code, sidestepping use of /// a bspline object. static shape_type get_left_brace_size ( int spline_degree , bcv_type bcv ) { int support = spline_degree / 2 ; // we start out with left_brace as large as the support // of the reconstruction kernel shape_type left_brace ( support ) ; // for some situations, we extend the array further along a specific axis for ( int d = 0 ; d < dimension ; d++ ) { // If the spline is done with REFLECT boundary conditions, // the lower and upper limits are between bounds. // the lower limit in this case is -0.5. When using // floor() or round() on this value, we receive -1, // so we need to extend the left brace. // if rounding could be done so that -0.5 is rounded // towards zero, this brace increase could be omitted // for even splines, but this would also bring operation // close to dangerous terrain: an infinitesimal undershoot // would already produce an out-of-bounds access. if ( bcv[d] == vspline::REFLECT ) { left_brace[d] ++ ; } // for other boundary conditions, the lower limit is 0. // for odd splines, // as long as evaluation is at positions >= 0 this is // okay, but as soon as evaluation is tried with a value // even just below 0, we'd have an out-of-bounds access, // with potential memory fault. Rather than requiring // evaluation to never undershoot, we err on the side // of caution and extend the left brace, so that // quantization errors won't result in a crash. // This is debatable and could be omitted, if it can // be made certain that evaluation will never be tried // at values even infinitesimally below zero. // for even splines, this problem does not exist, since // coordinate splitting is done with std::round. else if ( spline_degree & 1 ) { left_brace[d]++ ; } } return left_brace ; } static shape_type get_right_brace_size ( int spline_degree , bcv_type bcv ) { int support = spline_degree / 2 ; // we start out with right_brace as large as the support // of the reconstruction kernel shape_type right_brace ( support ) ; // for some situations, we extend the array further along a specific axis for ( int d = 0 ; d < dimension ; d++ ) { // If the spline is done with REFLECT boundary conditions, // the lower and upper limits are between bounds. // So the upper limit is Z + 0.5 where Z is integer. // using floor on this value lands at Z, which is fine, // but using round() (as is done for even splines) // lands at Z+1, so for this case we need to extend // the right brace. If we could use a rounding mode // rounding towards zero, we could omit this extension, // but we'd also be cutting it very fine. Again we err // on the side of caution. if ( bcv[d] == vspline::REFLECT ) { if ( ! ( spline_degree & 1 ) ) right_brace[d] ++ ; } // The upper limit is M-1 for most splines, and M-1+0.5 for // splines with REFLECT BCs. When accessing the spline at // this value, we'd be out of bounds. // For splines done with REFLECT BCs, we have to extend the // right brace to allow access to coordinates in [M-1,M-1+0.5], // there is no other option. // For other splines, We could require the evaluation code // to check and split incoming values of M-1 to M-2, 1.0, but this // would require additional inner-loop code. So we add another // coefficient on the upper side for these as well. // This is debatable, but with the current implementation of the // evaluation it's necessary. // So, erring on the side of caution, we add the extra coefficient // for all odd splines. if ( spline_degree & 1 ) { right_brace[d]++ ; } // periodic splines need an extra coefficient on the upper // side, to allow evaluation in [M-1,M]. This interval is // implicit in the original data since the value at M is // equal to the value at 0, but if we want to process without // bounds checking and index manipulations, we must provide // an extra coefficient. if ( bcv[d] == vspline::PERIODIC ) { right_brace[d]++ ; } } return right_brace ; } /// convenience method to caculate the shape of a container array needed to hold /// the coefficients of a spline with the given properties. The arguments are the /// same as those passed to the bspline object's constructor, but this method is /// static, so it can be called on the spline's type and does not need an object. /// I'm including this to make it easier for code which creates the container /// array externally before constructing the bspline object, rather than relying /// on class bspline to allocate it's own storage. static shape_type get_container_shape ( int spline_degree , bcv_type bcv , shape_type core_shape , int headroom ) { auto left_frame = get_left_brace_size ( spline_degree , bcv ) ; left_frame += headroom ; auto right_frame = get_right_brace_size ( spline_degree , bcv ) ; right_frame += headroom ; shape_type container_shape = core_shape + left_frame + right_frame ; return container_shape ; } template < typename = std::enable_if < _dimension == 1 > > static long get_container_shape ( int spline_degree , vspline::bc_code bc , long core_shape , int headroom ) { return get_container_shape ( spline_degree , bcv_type ( bc ) , shape_type ( core_shape ) , headroom ) [ 0 ] ; } /// construct a bspline object with appropriate storage space to contain and process an array /// of knot point data with shape _core_shape. Depending on the the other /// parameters passed, more space than _core_shape may be allocated. Once the bspline object /// is ready, usually it is filled with the knot point data and then the prefiltering needs /// to be done. This sequence assures that the knot point data are present in memory only once, /// the prefiltering is done in-place. So the user can create the bspline, fill in data (like, /// from a file), prefilter, and then evaluate. /// /// alternatively, if the knot point data are already manifest elsewhere, they can be passed /// to prefilter(). With this mode of operation, they are 'pulled in' during prefiltering. /// /// It's possible to pass in a view to an array providing space for the coefficients, /// or even the coefficients themselves. This is done via the parameter _space. This has /// to be an array of the same or larger shape than the container array would end up having /// given all the other parameters. This view is then 'adopted' and subsequent processing /// will operate on it's data. /// /// The additional parameter 'headroom' is used to make the 'frame' even wider. This is /// needed if the spline is to be 'shifted' up (evaluated as if it had been prefiltered /// with a higher-degree prefilter) - see shift(). The headroom goes on top of the brace. /// /// While bspline objects allow very specific parametrization, most use cases won't use /// parameters beyond the first few. The only mandatory parameter is, obviously, the /// shape of the knot point data, the original data which the spline is built over. /// This shape 'returns' as the bspline object's 'core' shape. If this is the only /// parameter passed to the constructor, the resulting bspline object will be a /// cubic b-spline with mirror boundary conditions, allocating it's own storage for the /// coefficients. /// /// Note that passing tolerance = 0.0 may increase prefiltering time significantly, /// especially when prefiltering 1D splines, which can't use multithreaded and vectorized /// code in this case. Really, tolerance 0 doesn't produce significantly better results /// than the default, which is a very low value already. The tolerance 0 code is there /// more for completeness' sake, as it actually produces the result of the prefilter /// using the *formula* to calculate the initial causal and anticausal coefficient /// precisely, whereas the small tolerance used by default is so small that it /// roughly mirrors the arithmetic precision which can be achieved with the given /// type, which leads to nearly the same initial coefficients. Oftentimes even the default /// is too conservative - a 'reasonable' value is in the order of magnitude of the noise /// in the signal you're processing. But using the default won't slow things down a great /// deal, since it only results in the initial coefficients being calculated a bit less /// quickly. With nD data, tolerance 0 is less of a problem because the operation will /// still be multithreaded and vectorized. bspline ( shape_type _core_shape , // shape of knot point data int _spline_degree = 3 , // spline degree with reasonable default bcv_type _bcv = bcv_type ( MIRROR ) , // boundary conditions and common default xlf_type _tolerance = -1.0 , // acceptable error (-1: automatic) int headroom = 0 , // additional headroom, for 'shifting' view_type _space = view_type() // coefficient storage to 'adopt' ) : spline_degree ( _spline_degree ) , bcv ( _bcv ) , tolerance ( _tolerance ) { if ( _tolerance < 0.0 ) { // I suppose this is a reasonable default for 'tolerance': tolerance = std::numeric_limits < ele_type > :: epsilon() ; // If numeric_limits is zero (because ele_type is integral) // we set it to xlf_type's epsilon to avoid using zero, which // we only use if the user explicitly selects it. if ( tolerance == 0 ) tolerance = std::numeric_limits < xlf_type > :: epsilon() ; } // first, calculate the shapes and sizes used internally left_frame = get_left_brace_size ( spline_degree , bcv ) ; left_frame += headroom ; right_frame = get_right_brace_size ( spline_degree , bcv ) ; right_frame += headroom ; shape_type container_shape = _core_shape + left_frame + right_frame ; // now either adopt external memory or allocate memory for the coefficients if ( _space.hasData() ) { // caller has provided space for the coefficient array. This space // has to be at least as large as the container_shape we have // determined to make sure it's compatible with the other parameters. // With the array having been provided by the caller, it's the caller's // responsibility to keep the data alive as long as the bspline object // is used to access them. if ( ! ( allGreaterEqual ( _space.shape() , container_shape ) ) ) throw shape_mismatch ( "the intended container shape does not fit into the shape of the storage space passed in" ) ; // if the shape matches, we adopt the data in _space. // We take a view to the container_shape-sized subarray only. container = _space.subarray ( shape_type() , container_shape ) ; // _p_coeffs is made to point to a default-constructed MultiArray, // which holds no data. _p_coeffs = std::make_shared < array_type >() ; } else { // _space was default-constructed and has no data. // in this case we allocate a container array and hold a shared_ptr // to it. so we can copy bspline objects without having to worry about // dangling pointers, or who destroys the array. _p_coeffs = std::make_shared < array_type > ( container_shape ) ; // 'container' is made to refer to a view to this array. container = *_p_coeffs ; } // finally we set the view to the core area core = container.subarray ( left_frame , left_frame + _core_shape ) ; } ; /// get a bspline object for a single channel of the data. This is lightweight /// and requires the viewed data to remain present as long as the channel view is used. /// the channel view inherits all metrics from it's parent, only the MultiArrayViews /// to the data are different. channel_view_type get_channel_view ( const int & channel ) { assert ( channel < channels ) ; ele_type * base = (ele_type*) ( container.data() ) ; base += channel ; auto stride = container.stride() ; stride *= channels ; MultiArrayView < dimension , ele_type > channel_container ( container.shape() , stride , base ) ; return channel_view_type ( core.shape() , spline_degree , bcv , tolerance , 0 , channel_container // coefficient storage to 'adopt' ) ; } ; /// prefilter converts the knot point data in the 'core' area into b-spline /// coefficients. Bracing/framing will be applied. Even if the degree of the /// spline is zero or one, prefilter() can be called because it also /// performs the bracing. /// the arithmetic of the prefilter is performed in 'math_ele_type', which /// defaults to the vigra RealPromoted elementary type of the spline's /// value_type. This default ensures that integral knot point data are /// handled appropriately and prefiltering them will only suffer from /// quantization errors, which may be acceptable if the dynamic range /// is sufficiently large. /// 'boost' is an additional factor which will be used to amplify the /// incoming signal. This is intended for cases where the range of the /// signal has to be widened to fill the dynamic range of the signal's /// data type (specifically if it is an integral type). User code has /// to deal with the effects of boosting the signal, the bspline object /// holds no record of the 'boost' being applied. When evaluating a /// spline with boosted coefficients, user code will have to provide /// code to attenuate the resulting signal back into the original /// range; for an easy way of doing so, see vspline::amplify_type, /// which is a type derived from vspline::unary_functor providing /// multiplication with a factor. There is an example of it's application /// in the context of an integer-valued spline in int_spline.cc. template < class math_ele_type = typename vigra::NumericTraits < ele_type > :: RealPromote , size_t vsize = vspline::vector_traits::size > void prefilter ( vspline::xlf_type boost = vspline::xlf_type ( 1 ) ) { // we assume data are already in 'core' and we operate in-place // prefilter first, passing in BC codes to pick out the appropriate functions to // calculate initial causal and anticausal coefficient, then 'brace' result. // note how, just as in brace(), the whole frame is filled, which may be more // than is strictly needed by the evaluator. vspline::prefilter < dimension , value_type , value_type , math_ele_type , vsize > ( core , core , bcv , spline_degree , tolerance , boost ) ; brace() ; } /// If data are passed in, they have to have precisely the shape /// we have set up in core (_core_shape passed into the constructor). /// These data will then be used in place of any data present in the /// bspline object to calculate the coefficients. They won't be looked at /// after prefilter() terminates, so it's safe to pass in a MultiArrayView /// which is destroyed after the call to prefilter() returns. Any data /// residing in the bspline object's memory will be overwritten. /// here, the default math_ele_type ensures that math_ele_type is /// appropriate for both T and ele_type. template < typename T , typename math_ele_type = typename vigra::NumericTraits < typename vigra::PromoteTraits < ele_type , ET > :: Promote > :: RealPromote , size_t vsize = vspline::vector_traits::size > void prefilter ( const vigra::MultiArrayView < dimension , T > & data , vspline::xlf_type boost = vspline::xlf_type ( 1 ) ) { // if the user has passed in data, they have to have precisely the shape // we have set up in core (_core_shape passed into the constructor). // This can have surprising effects if the container array isn't owned by the // spline but constitutes a view to data kept elsewhere (by passing _space the // to constructor): the data held by whatever constructed the bspline object // will be overwritten with the (prefiltered) data passed in via 'data'. // Whatever data have been in the core will be overwritten. if ( data.shape() != core.shape() ) throw shape_mismatch ( "when passing data to prefilter, they have to have precisely the core's shape" ) ; // prefilter first, passing in BC codes to pick out the appropriate functions to // calculate initial causal and anticausal coefficient, then 'brace' result. // note how, just as in brace(), the whole frame is filled, which may be more // than is strictly needed by the evaluator. vspline::prefilter < dimension , T , value_type , math_ele_type , vsize > ( data , core , bcv , spline_degree , tolerance , boost ) ; brace() ; } /// if the spline coefficients are already known, they obviously don't need to be /// prefiltered. But in order to be used by vspline's evaluation code, they need to /// be 'braced' - the 'core' coefficients have to be surrounded by more coeffcients /// covering the support the evaluator needs to operate without bounds checking /// inside the spline's defined range. brace() performs this operation. brace() /// assumes the bspline object has been set up with the desired initial parameters, /// so that the boundary conditions and metrics are already known and storage is /// available. brace() can be called for a specific axis, or for the whole container /// by passing -1. void brace ( int axis = -1 ) ///< specific axis, -1: all { if ( axis == -1 ) { bracer < dimension , value_type > :: apply ( container , bcv , left_frame , right_frame ) ; } else { bracer < dimension , value_type > :: apply ( container , bcv[axis] , left_frame[axis] , right_frame[axis] , axis ) ; } } /// overloaded constructor for 1D splines. This is useful because if we don't /// provide it, the caller would have to pass TinyVector < T , 1 > instead of T /// for the shape and the boundary condition. // KFJ 2018-07-23 Now I'm using enable_if to provide the following // c'tor overload only if dimension == 1. This avoids an error when // declaring explicit specializations: for dimension != 1, the compiler // would try and create this c'tor overload, which mustn't happen. // with the enable_if, if dimension != 1, the code is not considered. template < typename = std::enable_if < _dimension == 1 > > bspline ( long _core_shape , // shape of knot point data int _spline_degree = 3 , // spline degree with reasonable default bc_code _bc = MIRROR , // boundary conditions and common default xlf_type _tolerance = -1.0 , // acceptable error (relative to unit pulse) int headroom = 0 , // additional headroom, for 'shifting' view_type _space = view_type() // coefficient storage to 'adopt' ) :bspline ( vigra::TinyVector < long , 1 > ( _core_shape ) , _spline_degree , bcv_type ( _bc ) , _tolerance , headroom , _space ) { } ; /// 'shift' will change the interpretation of the data in a bspline object. /// d is taken as a difference to add to the current spline degree. The coefficients /// remain the same, but creating an evaluator from the shifted spline will make /// the evaluator produce data *as if* the coefficients were those of a spline /// of the changed order. Shifting with positive d will efectively blur the /// interpolated signal, shifting with negative d will sharpen it. /// For shifting to work, the spline has to have enough 'headroom', meaning that /// spline_degree + d, the new spline degree, has to be greater or equal to 0 /// and smaller than the largest supported spline degree (lower fourties) - /// and, additionally, there has to be a wide-enough brace to allow evaluation /// with the wider kernel of the higher-degree spline's reconstruction filter. /// So if a spline is set up with degree 0 and shifted to degree 5, it has to be /// constructed with an additional headroom of 3 (see the constructor). /// /// shiftable() is called with a desired change of spline_degree. If it /// returns true, interpreting the data in the container array as coefficients /// of a spline with the changed degree is safe. If not, the frame size is /// not sufficient or the resulting degree is invalid and shiftable() /// returns false. Note how the decision is merely technical: if the new /// degree is okay and the *frame* is large enough, the shift will be /// considered permissible. bool shiftable ( int d ) const { int new_degree = spline_degree + d ; if ( new_degree < 0 || new_degree > vspline_constants::max_degree ) return false ; shape_type new_left_brace = get_left_brace_size ( new_degree , bcv ) ; shape_type new_right_brace = get_right_brace_size ( new_degree , bcv ) ; if ( allLessEqual ( new_left_brace , left_frame ) && allLessEqual ( new_right_brace , right_frame ) ) { return true ; } return false ; } /// shift() actually changes the interpretation of the data. The data /// will be taken to be coefficients of a spline with degree /// spline_degree + d, and the original degree is lost. This operation /// is only performed if it is technically safe (see shiftable()). /// If the shift was performed successfully, this function returns true, /// false otherwise. /// Note that, rather than 'shifting' the b-spline object, it's also /// possible to use a 'shifted' evaluator to produce the same result. /// See class evaluator's constructor. bool shift ( int d ) { if ( shiftable ( d ) ) { spline_degree += d ; return true ; } return false ; } /// helper function to pretty-print a bspline object to an ostream friend std::ostream & operator<< ( std::ostream & osr , const bspline & bsp ) { osr << "dimension:................... " << bsp.dimension << std::endl ; osr << "degree:...................... " << bsp.spline_degree << std::endl ; osr << "boundary conditions:......... " ; for ( auto bc : bsp.bcv ) osr << " " << bc_name [ bc ] ; osr << std::endl ; osr << "shape of container array:.... " << bsp.container.shape() << std::endl ; osr << "shape of core:............... " << bsp.core.shape() << std::endl ; osr << "left frame:.................. " << bsp.left_frame << std::endl ; osr << "right frame:................. " << bsp.right_frame << std::endl ; osr << "ownership of data:........... " ; osr << ( bsp._p_coeffs->hasData() ? "bspline object owns data" : "data are owned externally" ) << std::endl ; osr << "container base adress:....... " << bsp.container.data() << std::endl ; osr << "core base adress:............ " << bsp.core.data() << std::endl ; return osr ; } } ; /// using declaration for a coordinate suitable for bspline, given /// elementary type rc_type. This produces the elementary type itself /// if the spline is 1D, a TinyVector of rc_type otherwise. template < class spline_type , typename rc_type > using bspl_coordinate_type = vspline::canonical_type < rc_type , spline_type::dimension > ; /// using declaration for a bspline's value type template < class spline_type > using bspl_value_type = typename spline_type::value_type ; } ; // end of namespace vspline #endif // VSPLINE_BSPLINE_H kfj-vspline-4b365417c271/common.h000066400000000000000000000246671333775006700163730ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file common.h \brief definitions common to all files in this project, utility code This file contains - some common enums and strings - definition of a few utility types used throughout vspline - exceptions used throughout vspline It includes vspline/vector.h which defines vspline's use of vectorization (meaning SIMD operation) and associated types ans code. */ #ifndef VSPLINE_COMMON #define VSPLINE_COMMON #include #include namespace vspline { /// This enumeration is used for codes connected to boundary conditions. /// There are two aspects to boundary conditions: During prefiltering, /// the initial causal and anticausal coefficients have to be calculated /// in a way specific to the chosen boundary conditions. Bracing also needs /// these codes to pick the appropriate extrapolation code to extrapolate /// the coefficients beyond the core array. typedef enum { MIRROR , // mirror on the bounds, so that f(-x) == f(x) PERIODIC, // periodic boundary conditions REFLECT , // reflect, so that f(-1) == f(0) (mirror between bounds) NATURAL, // natural boundary conditions, f(-x) + f(x) == 2 * f(0) CONSTANT , // clamp. used for framing, with explicit prefilter scheme ZEROPAD , // used for boundary condition, bracing GUESS , // used instead of ZEROPAD to keep margin errors lower INVALID } bc_code; /// bc_name is for diagnostic output of bc codes const std::string bc_name[] = { "MIRROR " , "PERIODIC ", "REFLECT " , "NATURAL ", "CONSTANT " , "ZEROPAD " , "GUESS " } ; // if 'USE_QUAD' is defined, the literals in poles.h will be interpreted // as quad literals (with 'Q' suffix), and xlf_type will be __float128. // otherwise, the suffix used will be L and xlf_type is long double. // why 'XLF'? it's for eXtra Large Float'. // TODO: on my system, I tried to compile with // g++ -std=gnu++11 -fext-numeric-literals -DUSE_QUAD but I got // 'error: unable to find numeric literal operator ‘operator""Q’' // I leave this issue open, hoping that using quad literals will // become unproblematic with future g++ releases. The machinery to // use them is all in place and ready, and the literals in poles.h // are precise enough to be used for even higher precision, with // 64 postcomma digits. should be reasonably future-proof ;) #ifdef USE_QUAD #define XLF(arg) (arg##Q) typedef __float128 xlf_type ; #else #define XLF(arg) (arg##L) typedef long double xlf_type ; #endif // #ifdef USE_QUAD /// using definition for the 'elementary type' of a type via vigra's /// ExpandElementResult mechanism. Since this is a frequently used idiom /// in vspline but quite a mouthful, here's an abbreviation: template < typename T > using ET = typename vigra::ExpandElementResult < T > :: type ; /// produce a std::integral_constant from the size obtained from /// vigra's ExpandElementResult mechanism template < typename T > using EN = typename std::integral_constant < int , vigra::ExpandElementResult < T > :: size > :: type ; /// is_element_expandable tests if a type T is known to vigra's /// ExpandElementResult mechanism. If this is so, the type is /// considered 'element-expandable'. template < class T > using is_element_expandable = typename std::integral_constant < bool , ! std::is_same < typename vigra::ExpandElementResult < T > :: type , vigra::UnsuitableTypeForExpandElements > :: value > :: type ; /// produce the 'canonical type' for a given type T. If T is /// single-channel, this will be the elementary type itself, /// otherwise it's a TinyVector of the elementary type. /// optionally takes the number of elements the resulting /// type should have, to allow construction from a fundamental /// and a number of channels. // vspline 'respects the singular'. this is to mean that throughout // vspline I avoid using fixed-size containers holding a single value // and use the single value itself. While this requires some type // artistry in places, it makes using the code more natural. Only // when it comes to providing generic code to handle data which may // or may not be aggregates, vspline moves to using 'synthetic' types, // which are always TinyVectors, possibly with only one element. // user code can pass data as canonical or synthetic types. All // higher-level operations will produce the same type as output // as they receive as input. The move to 'synthetic' types is only // done internally to ease the writing of generic code. // both the 'canonical' and the 'synthetic' type have lost all // 'special' meaning of their components (like their meaning as // the real and imaginary part of a complex number). vspline will // process all types in it's scope as such 'neutral' aggregates of // several identically-typed elementary values. template < typename T , int N = EN < T > :: value > using canonical_type = typename std::conditional < N == 1 , ET < T > , vigra::TinyVector < ET < T > , N > > :: type ; template < typename T , int N = EN < T > :: value > using synthetic_type = vigra::TinyVector < ET < T > , N > ; template < class T , size_t sz = 1 > struct invalid_scalar { static_assert ( sz == 1 , "scalar values can only take size 1" ) ; } ; /// definition of a scalar with the same template argument list as /// a simdized type, to use 'scalar' in the same syntactic slot template < class T , size_t sz = 1 > using scalar = typename std::conditional < sz == 1 , T , invalid_scalar < T , sz > > :: type ; /// vspline creates vigra::MultiArrays of vectorized types. As long as /// the vectorized types are Vc::SimdArray or vspline::simd_tv, using /// std::allocator is fine, but when using other types, using a specific /// allocator may be necessary. Currently this is never the case, but I /// have the lookup of allocator type from this traits class in place if /// it should become necessary. template < typename T > struct allocator_traits { typedef std::allocator < T > type ; } ; // TODO My use of exceptions is a bit sketchy... /// for interfaces which need specific implementations we use: struct not_implemented : std::invalid_argument { not_implemented ( const char * msg ) : std::invalid_argument ( msg ) { } ; } ; /// dimension-mismatch is thrown if two arrays have different dimensions /// which should have the same dimensions. struct dimension_mismatch : std::invalid_argument { dimension_mismatch ( const char * msg ) : std::invalid_argument ( msg ) { } ; } ; /// shape mismatch is the exception which is thrown if the shapes of /// an input array and an output array do not match. struct shape_mismatch : std::invalid_argument { shape_mismatch ( const char * msg ) : std::invalid_argument ( msg ) { } ; } ; /// exception which is thrown if an opertion is requested which vspline /// does not support struct not_supported : std::invalid_argument { not_supported ( const char * msg ) : std::invalid_argument ( msg ) { } ; } ; /// out_of_bounds is thrown by mapping mode REJECT for out-of-bounds coordinates /// this exception is left without a message, it only has a very specific application, /// and there it may be thrown often, so we don't want anything slowing it down. struct out_of_bounds { } ; /// exception which is thrown when assigning an rvalue which is larger than /// what the lvalue can hold struct numeric_overflow : std::invalid_argument { numeric_overflow ( const char * msg ) : std::invalid_argument ( msg ) { } ; } ; } ; // end of namespace vspline // with these common definitions done, we now include 'vector.h', which // defines vectorization methods used throughout vspline. #include "vector.h" #endif // VSPLINE_COMMON kfj-vspline-4b365417c271/convolve.h000066400000000000000000000433651333775006700167320ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file convolve.h \brief separable convolution of nD arrays This file provides the core filtering code for convolution, which can be used by itself to filter 1D arrays, or is used with the 'wielding' code in filter.h to filter nD arrays. The latter use is what's used throughout most of vspline, since it provides automatic multithreading and vectorization by buffering the data and applying the 1D code to the buffer. The implementation of convolution in this file can safely operate in-place. The actual convolution operation is done using a small kernel-sized circular buffer, which is multiplied with an adequately shifted and rotated representation of the kernel. This is done avoiding conditionals as best as possible. The 1D data are extrapolated with one of the boundary condition codes known to class extrapolator (see extrapolate.h). This is done transparently by putting extrapolated data into the small circular buffer where this is needed. The code is trivial insofar as it only uses indexed assignments, addition and multiplication. So it can operate on a wide variety of data types, prominently among them SIMD vector types. Note how I use the kernel front-to-back, in the same forward sequence as the data it is applied to. This is contrary to the normal convention of using the kernel values back-to-front. Inside vspline, where only symmetrical kernels are used, this makes no difference, but when vpline's convolution code is used for other convolutions, this has to be kept in mind. */ #ifndef VSPLINE_CONVOLVE_H #define VSPLINE_CONVOLVE_H #include "common.h" #include "extrapolate.h" #include "filter.h" namespace vspline { /// fir_filter_specs holds the parameters for a filter performing /// a convolution along a single axis. In vspline, the place where /// the specifications for a filter are fixed and the place where /// it is finally created are far apart: the filter is created /// in the separate worker threads. So this structure serves as /// a vehicle to transport the arguments. /// Note the specification of 'headroom': this allows for /// non-symmetrical and even kernels. When applying the kernel /// to obtain output[i], the kernel is applied to /// input [ i - headroom ] , ... , input [ i - headroom + ksize - 1 ] struct fir_filter_specs { vspline::bc_code bc ; // boundary conditions int ksize ; // kernel size int headroom ; // part of kernel 'to the left' const xlf_type * kernel ; // pointer to kernel values fir_filter_specs ( vspline::bc_code _bc , int _ksize , int _headroom , const xlf_type * _kernel ) : bc ( _bc ) , ksize ( _ksize ) , headroom ( _headroom ) , kernel ( _kernel ) { assert ( headroom < ksize ) ; } ; } ; /// class fir_filter provides the 'solve' routine which convolves /// a 1D signal with selectable extrapolation. Here, the convolution /// kernel is applied to the incoming signal and the result is written /// to the specified output location. Note that this operation /// can be done in-place, but input and output may also be different. /// While most of the time this routine will be invoked by class /// convolve (below), it is also directly used by the specialized /// code for 1D filtering. /// Note how we conveniently inherit from the specs class. This also /// enables us to use an instance of fir_filter or class convolve /// as specs argument to create further filters with the same arguments. // TODO: some kernels are symmetric, which might be exploited. // TODO: special code for filters with 0-valued coefficients, like // sinc-derived half band filters template < typename in_type , typename out_type = in_type , typename _math_type = out_type > struct fir_filter : public fir_filter_specs { // this filter type does not need storage of intermediate results. static const bool is_single_pass { true } ; typedef vigra::MultiArrayView < 1 , in_type > in_buffer_type ; typedef vigra::MultiArrayView < 1 , out_type > out_buffer_type ; typedef _math_type math_type ; // we put all state data into a single area of memory called 'reactor'. // The separate parts holding the small circular buffer, the repeated // kernel and the tail buffer are implemented as views to 'reactor'. // This way, all data participating in the arithmetics are as close // together in memory as possible. // note how the current implementation does therefore hold the kernel // values in the 'reactor' as simdized types (if math_type is simdized). // this may be suboptimal, since the kernel values might be supplied // as scalars and could be kept in a smaller area of memory. // TODO: investigate using allocator_t = typename vspline::allocator_traits < math_type > :: type ; vigra::MultiArray < 1 , math_type , allocator_t > reactor ; vigra::MultiArrayView < 1 , math_type > circular_buffer ; vigra::MultiArrayView < 1 , math_type > kernel_values ; vigra::MultiArrayView < 1 , math_type > tail_buffer ; fir_filter ( const fir_filter_specs & specs ) : fir_filter_specs ( specs ) , reactor ( vigra::Shape1 ( specs.ksize * 4 ) ) { circular_buffer = reactor.subarray ( vigra::Shape1 ( 0 ) , vigra::Shape1 ( ksize ) ) ; kernel_values = reactor.subarray ( vigra::Shape1 ( ksize ) , vigra::Shape1 ( ksize * 3 ) ) ; tail_buffer = reactor.subarray ( vigra::Shape1 ( ksize * 3 ) , vigra::Shape1 ( ksize * 4 ) ) ; for ( int i = 0 ; i < ksize ; i++ ) kernel_values [ i ] = kernel_values [ i + ksize ] = kernel [ i ] ; } ; /// calling code may have to set up buffers with additional /// space around the actual data to allow filtering code to /// 'run up' to the data, shedding margin effects in the /// process. We stay on the safe side and return the width /// of the whole kernel, which is always sufficient to /// provide safe runup. int get_support_width() const { return ksize ; } /// public 'solve' routine. This is for calls 'from outside', /// like when this object is used by itself, not as a base class /// of class convolve below. /// an extrapolator for the boundary condition code 'bc' /// (see fir_filter_specs) is made, then the call is delegated /// to the protected routine below which accepts an extrapolator /// on top of input and output. void solve ( const in_buffer_type & input , out_buffer_type & output ) { int size = output.size() ; extrapolator < in_buffer_type > source ( bc , input ) ; solve ( input , output , source ) ; } protected: /// protected solve routine taking an extrapolator on top of /// input and output. This way, the derived class (class convolve) /// can maintain an extrapolator fixed to it's buffer and reuse /// it for subsequent calls to this routine. /// we use the following strategy: /// - keep a small circular buffer as large as the kernel /// - have two kernels concatenated in another buffer /// - by pointing into the concatenated kernels, we can always /// have ksize kernel values in sequence so that this sequence /// is correct for the values in the circular buffer. /// this strategy avoids conditionals as best as possible and /// should be easy to optimize. the actual code is a bit more /// complex to account for the fact that at the beginning and /// end of the data, a few extrapolated values are used. The /// central loop can directly read from input without using the /// extrapolator, which is most efficient. void solve ( const in_buffer_type & input , out_buffer_type & output , const extrapolator < in_buffer_type > & source ) { if ( ksize < 1 ) { // if kernel size is zero or even negative, then, // if operation isn't in-place, copy input to output if ( (void*) ( input.data() ) != (void*) ( output.data() ) ) { for ( std::ptrdiff_t i = 0 ; i < output.size() ; i++ ) output[i] = out_type ( input[i] ) ; } return ; // we're done prematurely } else if ( ksize == 1 ) { // for kernel size 1 we perform the multiplication of the // single kernel value with the input in a simple loop without // using the circular buffering mechanism below. This is an // optimization, the circular buffer code can also handle // single-value kernels. math_type factor ( kernel[0] ) ; for ( std::ptrdiff_t i = 0 ; i < output.size() ; i++ ) output[i] = out_type ( factor * math_type ( input[i] ) ) ; return ; // we're done prematurely } int si = - headroom ; // read position int ti = 0 ; // store position // initialize circular buffer using the extrapolator // note: initially I coded to fetch only the first 'headroom' // values from the extrapolator, then up to ksize straight // from 'input'. but this is *not* correct: 'input' may by // very small, and with a large kernel we also need the // extrapolator further on after the input is already // consumed. So this is the correct way of doing it: for ( int i = 0 ; i < ksize ; i++ , si++ ) circular_buffer[i] = source ( si ) ; // see how many full cycles we can run, directly accessing // 'input' without resorting to extrapolation int size = output.size() ; int leftover = size - si ; int full_cycles = 0 ; if ( leftover > 0 ) full_cycles = leftover / ksize ; // stash the trailing extrapolated values: we want to be able // to operate in-place, and if we write to the buffer we can't // use the extrapolator over it anymore. note how we only fill // in ksize - headroom values. this is all we'll need, the buffer // may be slightly larger. int ntail = ksize - headroom ; int z = size ; for ( int i = 0 ; i < ntail ; i++ , z++ ) tail_buffer[i] = source ( z ) ; // central loop, reading straight from input without extrapolation for ( int cycle = 0 ; cycle < full_cycles ; cycle++ ) { auto p_kernel = kernel_values.data() + ksize ; auto p_data = circular_buffer.data() ; for ( int i = 0 ; i < ksize ; ) { // perform the actual convolution // TODO: exploit symmetry math_type result = circular_buffer[0] * p_kernel[0] ; for ( int j = 1 ; j < ksize ; j++ ) result += circular_buffer[j] * p_kernel[j] ; // stash result output [ ti ] = out_type ( result ) ; // fetch next input value * p_data = input [ si ] ; // adjust pointers and indices ++ si ; ++ ti ; ++ i ; if ( i == ksize ) break ; ++ p_data ; -- p_kernel ; } } // produce the last few values, resorting to tail_buffer // where it is necessary while ( ti < size ) { auto p_kernel = kernel_values.data() + ksize ; auto p_data = circular_buffer.data() ; for ( int i = 0 ; i < ksize && ti < size ; i++ ) { math_type result = circular_buffer[0] * p_kernel[0] ; for ( int j = 1 ; j < ksize ; j++ ) result += circular_buffer[j] * p_kernel[j] ; output [ ti ] = out_type ( result ) ; if ( si < size ) // still sweet * p_data = input [ si ] ; else // input used up, use stashed extrapolated values * p_data = tail_buffer [ si - size ] ; ++ si ; ++ ti ; ++ p_data ; -- p_kernel ; } } } } ; /// class convolve provides the combination of class fir_filter /// above with a vector-friendly buffer. Calling code provides /// information about what should be buffered, the data are sucked /// into the buffer, filtered, and moved back from there. /// The operation is orchestrated by the code in filter.h, which /// is also used to 'wield' the b-spline prefilter. Both operations /// are sufficiently similar to share the wielding code. template < template < typename , size_t > class _vtype , typename _math_ele_type , size_t _vsize > struct convolve : public buffer_handling < _vtype , _math_ele_type , _vsize > , public vspline::fir_filter < _vtype < _math_ele_type , _vsize > > { // provide this type for queries typedef _math_ele_type math_ele_type ; // we'll use a few types from the buffer_handling type typedef buffer_handling < _vtype , _math_ele_type , _vsize > buffer_handling_type ; using typename buffer_handling_type::vtype ; using buffer_handling_type::vsize ; using buffer_handling_type::init ; // instances of class convolve hold the buffer as state: using allocator_t = typename vspline::allocator_traits < vtype > :: type ; typedef vigra::MultiArray < 1 , vtype , allocator_t > buffer_type ; typedef vigra::MultiArrayView < 1 , vtype > buffer_view_type ; buffer_type buffer ; // and also an extrapolator, which is fixed to the buffer extrapolator < buffer_view_type > buffer_extrapolator ; // the filter's 'solve' routine has the workhorse code to filter // the data inside the buffer: typedef _vtype < _math_ele_type , _vsize > simdized_math_type ; typedef vspline::fir_filter < simdized_math_type > filter_type ; using filter_type::solve ; using filter_type::headroom ; // by defining arg_type, we allow code to infer what type of // initializer ('specs') the filter takes typedef fir_filter_specs arg_type ; // the constructor invokes the filter's constructor, // sets up the buffer and initializes the buffer_handling // component to use the whole buffer to accept incoming and // provide outgoing data. convolve ( const fir_filter_specs & specs , size_t size ) : filter_type ( specs ) , buffer ( size ) , buffer_extrapolator ( specs.bc , buffer ) { init ( buffer , buffer ) ; } ; // operator() simply delegates to the filter's 'solve' routine, // which filters the data in the buffer. Note how the solve // overload accepting an extrapolator is used: the extrapolator // remains the same, so there's no point creating a new one // with every call. void operator() () { solve ( buffer , buffer , buffer_extrapolator ) ; } // factory function to provide a filter with the same set of // parameters, but possibly different data types. this is used // for processing of 1D data, where the normal buffering mechanism // may be sidestepped. template < typename in_type , typename out_type = in_type , typename math_type = out_type > static vspline::fir_filter < in_type , out_type , math_type > get_raw_filter ( const fir_filter_specs & specs ) { return vspline::fir_filter < in_type , out_type , math_type > ( specs ) ; } } ; } ; // namespace vspline #endif // VSPLINE_CONVOLVE_H kfj-vspline-4b365417c271/domain.h000066400000000000000000000356511333775006700163450ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file domain.h \brief code to perform combined scaling and translation on coordinates This is what might be called collateral code, class domain is not currently used in vspline. A common requirement is to map coordinates in one range to another range, effectively performing a combined scaling and translation. Given incoming coordinates in a range [ in_low , in_high ] and a desired range for outgoing coordinates of [ out_low , out_high ], and an incoming coordinate c, a vspline::domain performs this operation: c' = ( c - in_low ) * scale + out_low where scale = ( out_high - out_low ) / ( in_high - in_low ) The code can handle arbitrary dimensions, float and double coordinate elementary types, and, optionally, it can perform vectorized operations on vectorized coordinates. vspline::domain is derived from vspline::unary_functor and can be used like any other vspline::unary_functor. A common use case would be to access a vspline::evaluator with a different coordinate range than the spline's 'natural' coordinates (assuming a 1D spline of floats): auto _ev = vspline::make_safe_evaluator ( bspl ) ; auto ev = vspline::domain ( bspl , 0 , 100 ) + _ev ; ev.eval ( coordinate , result ) ; Here, the domain is built over the spline with an incoming range of [ 0 , 100 ], so evaluating at 100 will be equivalent to evaluating _ev at bspl.upper_limit(). */ #ifndef VSPLINE_DOMAIN_H #define VSPLINE_DOMAIN_H #include "unary_functor.h" #include namespace vspline { /// class domain is a coordinate transformation functor. It provides /// a handy way to translate an arbitrary range of incoming coordinates /// to an arbitrary range of outgoing coordinates. This is done with a /// linear translation function. if the source range is [s0,s1] and the /// target range is [t0,t1], the translation function s->t is: /// /// t = ( s - s0 ) * ( t1 - t0 ) / ( s1 - s0 ) + t0 /// /// In words: the target coordinate's distance from the target range's /// lower bound is proportional to the source coordinate's distance /// from the source range's lower bound. Note that this is *not* a /// gate function: the domain will accept any incoming value and /// perform the shift/scale operation on it; incoming values outside /// [ in_low , in_high ] will produce outgoing values outside /// [ out_low , out_high ]. /// /// The first constructor takes s0, s1, t0 and t1. With this functor, /// arbitrary mappings of the form given above can be achieved. /// The second constructor takes a vspline::bspline object to obtain /// t0 and t1. These are taken as the spline's 'true' range, depending /// on it's boundary conditions: for periodic splines, this is [0...M], /// for REFLECT Bcs it's [-0.5,M-0,5], and for 'normal' splines it's /// [0,M-1]. s0 and s1, the start and end of the domain's coordinate range, /// can be passed in and default to 0 and 1, which constitutes 'normalized' /// spline coordinates, where 0 is mapped to the lower end of the 'true' /// range and 1 to the upper. /// /// class domain is especially useful for situations where several b-splines /// cover the same data in different resolution, like in image pyramids. /// If these different splines are all evaluated with a domain chained to the /// evaluator which uses a common domain range, they can all be accessed with /// identical coordinates, even if the spline shapes don't match isotropically. /// Note, though, that domain objects created with b-splines with REFLECT /// or PERIODIC boundary conditions won't match unless the splines have the /// same size because the spline's bounds won't coincide for differently /// sized splines. You may want to create the domain object from coordinate /// values in this case. /// /// The evaluation routine in class domain_type makes sure that incoming /// values in [ in_low , in_high ] will never produce outgoing values /// outside [ out_low , out_high ]. If this guarantee is not needed, the /// 'raw' evaluation routine _eval can be used instead. with _eval, output /// may overshoot out_high slightly. /// /// I should mention libeinspline here, which has this facility as a fixed /// feature in it's spline types. I decided to keep it separate and create /// class domain instead for those cases where the functionality is needed. template < typename coordinate_type , size_t _vsize = vspline::vector_traits::vsize > struct domain_type : public vspline::unary_functor < coordinate_type , coordinate_type , _vsize > , public vspline::callable < domain_type < coordinate_type , _vsize > , coordinate_type , coordinate_type , _vsize > { typedef vspline::unary_functor < coordinate_type , coordinate_type , _vsize > base_type ; using base_type::dim_in ; using base_type::vsize ; using typename base_type::in_type ; using typename base_type::in_ele_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::in_ele_v ; using typename base_type::out_v ; typedef typename base_type::in_ele_type rc_type ; typedef typename base_type::in_nd_ele_type nd_rc_type ; typedef typename vigra::TinyVector < rc_type , dim_in > limit_type ; // internally, we work with definite TinyVectors: const limit_type out_low , in_low , out_high , in_high ; const limit_type scale ; /// constructor taking the lower and upper fix points /// for incoming and outgoing values domain_type ( const coordinate_type & _in_low , const coordinate_type & _in_high , const coordinate_type & _out_low , const coordinate_type & _out_high ) : in_low ( _in_low ) , out_low ( _out_low ) , in_high ( _in_high ) , out_high ( _out_high ) , scale ( ( _out_high - _out_low ) / ( _in_high - _in_low ) ) { assert ( in_low != in_high && out_low != out_high ) ; } /// constructor taking the fix points for outgoing values /// from a bspline object, and the incoming lower and upper /// fix points explicitly template < class bspl_type > domain_type ( const bspl_type & bspl , const coordinate_type & _in_low = coordinate_type ( 0 ) , const coordinate_type & _in_high = coordinate_type ( 1 ) ) : in_low ( _in_low ) , in_high ( _in_high ) , out_low ( bspl.lower_limit() ) , out_high ( bspl.upper_limit() ) , scale ( ( bspl.upper_limit() - bspl.lower_limit() ) / ( _in_high - _in_low ) ) { static_assert ( dim_in == bspl_type::dimension , "can only create domain from spline if dimensions match" ) ; assert ( in_low != in_high ) ; } private: /// 'polish' simply checks if in is == in_high and returns /// out_high for that case. This looks silly, but depending /// on the scaling factor, out may overshoot out_high if /// in == in_high. Hence this routine. void polish ( const in_type & in , out_type & out , int d ) const { out = ( in == in_high[d] ) ? out_high[d] : out ; } /// now the vectorized version. Here we can use /// 'proper' vector code with masking: the equality /// test yields a SIMD mask. template < typename T , int N > void polish ( const vspline::simdized_type < T , N > & in , vspline::simdized_type < T , N > & out , int d ) const { auto mask = ( in == in_high[d] ) ; assign_if ( out , mask , out_high[d] ) ; } public: // next we have variants of eval(). Depending on circumstances, // we have a fair number of distinct cases to deal with: we may // receive vectorized or unvectorized data, possibly of several // dimensions, and the vector data may or may not use Vc SIMD types. /// eval for unvectorized data. we dispatch on in_type, the incoming /// coordinate, being a 'naked' fundamental or a TinyVector void eval ( const in_type & in , out_type & out ) const { eval ( in , out , std::is_fundamental < in_type > () ) ; } /// eval dispatch for fundamental, 1D coordinates void eval ( const in_type & in , out_type & out , std::true_type // in_type is a naked fundamental ) const { if ( in == in_high[0] ) out = out_high[0] ; else out = ( in - in_low[0] ) * scale[0] + out_low[0] ; } /// eval dispatch for TinyVectors of fundamentals /// - unvectorized nD cordinates void eval ( const in_type & in , out_type & out , std::false_type // in_type is a TinyVector ) const { for ( int d = 0 ; d < dim_in ; d++ ) { if ( in[d] == in_high[d] ) out[d] = out_high[d] ; else out[d] = ( in[d] - in_low[d] ) * scale[d] + out_low[d] ; } } /// eval for vectorized data. /// if in_type is a 'naked' fundamental, crd_type is a 'naked' ele_v, /// otherwise it's a TinyVector of ele_v. So we dispatch: template < class crd_type > void eval ( const crd_type & in , crd_type & out ) const { eval ( in , out , std::is_fundamental < in_type > () ) ; } /// eval dispatch for vectorized coordinates, 1D case template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & in , out_v & out , std::true_type // crd_type is a 'naked' ele_v ) const { out = ( in - in_low[0] ) * scale[0] + out_low[0] ; // we dispatch to one of two variants of 'polish', depending on // whether crd_type, which is a vspline::vector, internally uses // a Vc type or a TinyVector: polish < rc_type , vsize > ( in , out , 0 ) ; } /// eval dispatch for vectorized coordinates, nD case. This iterates over /// the dimensions and performs the 1D operation for each component. template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & in , out_v & out , std::false_type ) const { typedef typename in_v::value_type component_type ; for ( int d = 0 ; d < dim_in ; d++ ) { out[d] = ( in[d] - in_low[d] ) * scale[d] + out_low[d] ; polish < rc_type , vsize > ( in[d] , out[d] , d ) ; } } } ; /// factory function to create a domain_type type object from the /// desired lower and upper fix point for incoming coordinates and /// the lower and upper fix point for outgoing coordinates. /// the resulting functor maps incoming coordinates in the range of /// [in_low,in_high] to coordinates in the range of [out_low,out_high] template < class coordinate_type , size_t _vsize = vspline::vector_traits::vsize > vspline::domain_type < coordinate_type , _vsize > domain ( const coordinate_type & in_low , const coordinate_type & in_high , const coordinate_type & out_low , const coordinate_type & out_high ) { return vspline::domain_type < coordinate_type , _vsize > ( in_low , in_high , out_low , out_high ) ; } /// factory function to create a domain_type type object /// from the desired lower and upper reference point for incoming /// coordinates and a vspline::bspline object providing the lower /// and upper reference for outgoing coordinates /// the resulting functor maps incoming coordinates in the range of /// [ in_low , in_high ] to coordinates in the range of /// [ bspl.lower_limit() , bspl.upper_limit() ] template < class coordinate_type , class spline_type , size_t _vsize = vspline::vector_traits::vsize > vspline::domain_type < coordinate_type , _vsize > domain ( const spline_type & bspl , const coordinate_type & in_low = coordinate_type ( 0 ) , const coordinate_type & in_high = coordinate_type ( 1 ) ) { return vspline::domain_type < coordinate_type , _vsize > ( bspl , in_low , in_high ) ; } } ; // namespace vspline #endif // #ifndef VSPLINE_DOMAIN_H kfj-vspline-4b365417c271/doxy.h000066400000000000000000001416131333775006700160550ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ // This header doesn't contain any code, only the text for the main page of the documentation. /*! \mainpage \section intro_sec Introduction vspline is a header-only generic C++ library for the creation and use of uniform B-splines. It aims to be as comprehensive as feasibly possible, yet at the same time producing code which performs well, so that it can be used in production. vspline is CPU-based, it does not use the GPU. vspline's main focus is interpolation of bulk nD raster data. It was developed originally to be used for image processing software. In image processing, oftentimes large amounts of pixels need to be submitted to identical operations, suggesting a functional approach. vspline offers functional programming elements to implement such programs. vspline was developed on a Linux system using clang++ and g++. It has not been tested much with other systems or compilers, and as of this writing I am aware that the code probably isn't fully portable. The code uses the C++11 standard. Note: in November 2017, with help from Bernd Gaucke, vspline's companion program pv, which uses vspline heavily, was successfully compiled with 'Visual Studio Platform toolset V141'. While no further tests have been done, I hope that I can soon extend the list of supported platforms. vspline relies on one other library: - VIGRA, mainly for data handling using vigra's MultiArrayView and TinyVector types Optionally, vspline can make use of Vc - Vc, for the use of explicit vectorization I find VIGRA indispensible, omitting it from vspline is not really an option. Use of Vc is optional, though, and may be activated by defining 'USE_VC'. This should be done by passing -D USE_VC to the compiler; defining USE_VC only for parts of a project may or may not work. Please note that vspline uses Vc's 1.3 branch, not the master branch. 1.3 is what you are likely to find in your distro's packet repositories; if you check out Vc from github, make sure you pick the 1.3 branch. I have made an attempt to generalize the code so that it can handle - fundamental numeric data types and their aggregates (like, pixels) - a reasonable selection of boundary conditions - arbitrary spline degrees (up to 45) - arbitrary dimensions of the spline - in multithreaded code - using the CPU's vector units if possible On the evaluation side I provide - evaluation of the spline at point locations in the defined range - evaluation of the spline's derivatives - mapping of arbitrary coordinates into the defined range - evaluation of nD arrays of coordinates (remap function) - generalized 'transform' and 'apply' functions - restoration of the original data from the coefficients To produce maximum perfomance, vspline also has a fair amount of collateral code, and some of this code may be helpful beyond vspline: - range-based multithreading with a thread pool - functional constructs using vspline::unary_functor* - forward-backward n-pole recursive filtering* - separable convolution* - efficient access to the b-spline basis functions - extremely precise precalculated constants \*note that the filtering code and the transform-type routines multithread and vectorize automatically. \section install_sec Installation vspline is available as a debian package, so the easiest way to install vspline is to install the vspline package, if it is provided by your package manager. this will also take care of the dependencies and install the examples. ubuntu should provide a vspline package from 18.04 on. As of this writing, vspline is still under development, and the master branch in the git repository may offer new/different functionality to the code in the package. While I try to keep the 'higher-level' functions' and objects' interface consistent, the signatures may still change, especially in their default arguments, so switching from the packaged version to the master branch, or upgrading to a newer packaged version may require changes to the calling code. If you're installing manually, note that vspline is header-only, so it's sufficient to place the headers where your code can access them. They must all reside in the same folder. VIGRA (and optionally Vc) are supposed to be installed in a location where they can be found so that includes along the lines of #include succeed. \section compile_sec Compilation While your distro's packages may be sufficient to get vspline up and running, you may need newer versions of VIGRA and Vc. At the time of this writing the latest versions commonly available were Vc 1.3.0 and VIGRA 1.11.0; I compiled Vc and VIGRA from source, using up-to-date pulls from their respective repositories. Vc 0.x.x will not work with vspline, and only Vc's 1.3 branch seems to be compatible with vspline. Using Vc only makes sense if you are aiming for maximum performance; vspline's code autovectorizes well and oftentimes performs just about as well without Vc, but if you are writing more complex pipelines, the gains from using explicit vectorization tend to increase. update: ubuntu 17.04 has vigra and Vc packages which are sufficiently up-to-date. To compile software using vspline, I preferably use clang++: ~~~~~~~~~~~~~~ clang++ -pthread -O3 -march=native --std=c++11 your_code.cc // or, using Vc: clang++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc ~~~~~~~~~~~~~~ -pthread is needed to run the automatic multithreading used for most of vspline's bulk data processing, like prefiltering or the transform routines. It's possible to disable multithreading by passing -D VSPLINE_SINGLETHREAD, in which case you can omit -pthread: ~~~~~~~~~~~~~~ clang++ -DVSPLINE_SINGLETHREAD --std=c++11 your_code.cc ~~~~~~~~~~~~~~ optimization is essential to get decent performance, but if you just work with a few values or have trouble debugging optimzed code, you can omit it. I have found using -Ofast usually works as well. vspline uses the C++11 standard and will not compile with lower standard versions. using C++14 is fine. vigraimpex (VIGRA's image import/export library) is used by some of the examples, for these you'll need an additional -lvigraimpex Please note that an executable using vectorization produced for your system may likely not work on a machine with another CPU. It's best to compile for the intended target architecture explicitly using -march... 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment. If you're compiling for the same system you're on, you can use -march=native. If you don't specify the architecture for the compilation, you'll likely get some lowest-common-denominator binary (like, SSE) which will not exploit your hardware optimally, but run on a wider range of systems. Autovectorization will still be used even without Vc being present, and if you compile for a given target architecture, the binary may contain vector instructions specific to the architecture and will crash on targets not supporting the relevant ISA. If you don't want to use clang++, g++ will also work. In my tests I found that clang++ produced faster code, but this may or may not be the case for specific compiler versions or taget architectures. You'll have to try it out. \section license_sec License vspline is free software, licensed under this license:
vspline - a set of generic tools for creation and evaluation of uniform b-splines Copyright 2015 - 2018 by Kay F. Jahnke Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\section strategy_sec Strategy This section lines out the strategies used in vspline to get from original data to an interpolated result. If you prefer to start coding straight away, continue to the next section. When you are using vspline, your information goes through several stages. Starting out with your original data, the first step is to 'prefilter' these data to obtain 'b-spline coefficients'. vspline keeps these coefficients in a 'bspline' object. Next you obtain an 'evaluator' using these coefficients. This is a functor; you call it with (real) coordinates and receive interpolated values. While you can produce your results by repeated calls to the evaluator, vspline also offers mass evaluation, a typical example is a 'classic' remap, where you provide an array full of coordinates and receive an array full of interpolated values. Using vspline's mass evaluation methods is a good idea, because this is done very efficiently using multithreading and hardware vectorization. I'll go through these stages in detail. 'direct' interpolation uses some arithmetic construct to derive an interpolated value from several data points in the 'vicinity' of the interpolation locus. The value of this arithmetic construct collapses to the data point itself if it coincides with the interpolation locus. b-splines are *not* direct interpolators: the interpolated value is not directly derived from an arithmetic construct involving the data points, instead it is derived from a set of coefficients which have been produced from the data points by 'prefiltering' them. This implies a sequence of operations which has to be followed in order to use b-spline interpolation: - the 'original' data (aka 'knot point data') are 'prefiltered', yielding the b-spline's 'coefficients' - for evaluation, a subset of the coefficients is selected and processed to yield the interpolated value. This two-step process has one distict disadvantage: the coefficients have to be held in memory until all evaluations of the spline are done. While there is no way around storing the coefficients, storing the original data is *not* necessary to evaluate the spline. So if the original data aren't used somewhere 'further down the line', they can be discarded as soon as the coefficients are known. Discarding the original data can even be done *while* calculating the coefficients, and it is possible to replace the original data with the corresponding coefficients in the process of prefiltering: the operation can be done 'in place'. vspline can exploit situations where the original data won't be used after the coefficients have been produced, reducing storage requirements to just above what is needed for the original data. This can be done in two ways: - when first obtaining the original data (like, from a file), they can be immediately placed into a bspline object's 'core', where they are prefiltered in-place. This is the best way since it requires no further intermediate storage. For an example, see ca_correct.cc, where this way of handling data is used in main(), look for 'reading image'. another example is in 'gradient.cc', where the original data are produced arithmetically and directly placed into the bspline object's 'core'. - if the data are already present in memory, they can be 'sucked into' a bspline object during prefiltering. This is achieved by creating the bspline object and passing the data to it's prefilter routine. This way is second best: the original data will only be read once, and as soon as the coefficients are ready, they may or may not be discarded. You can find an example for this strategy in n_shift.cc, look for invocations of 'prefilter'. Of course, if you don't care for maximum performance, you can use a third way, which is the most legible way to code the process: - Start out with the data in memory and *assign* them to the bspline object's 'core', then call prefilter. Performance-wise this is the worst way of doing it, since the data are read from one part of memory, written to another part (the bspline objet's core) and reread from there as soon as prefiltering starts. Now you might ask why a bspline object shouldn't be created by somehow transforming the raw data. This is due to the fact that vspline holds more coefficients than original data: for efficient evaluation without needs for bounds checking, a small 'frame' of additional coefficients is put around the 'core' coefficients. This frame is handled by the bspline object, and the space in memory holding the 'core' coefficients is a 'subarray' of the total amount of coefficients held by the bspline object. It is possible to 'adopt' storage holding original data with a bspline object, but this can only be done if the raw data are already surrounded by a suitably-sized frame of additional storage to accept the 'frame' after prefiltering. If you set up your data so that the 'frame' is present, you can make the spline 'adopt' the data when it is created, but you might as well create the bspline object first and populate it's core afterwards, instead of creating, populating and adpoting the data. When you set up a bspline object, you instantiate class bspline with the data type it should use for coefficients. Oftentimes this will be the same data type as your original data. But you can use a different type for the coefficients, as long as the types are convertible. When using a different type, you can still use the overload of 'prefilter' 'sucking in' the original data: they will be converted in the process. This is especially useful if your source data are of a type with small dynamic range (like 8 bit pixels), which aren't well-suited for arithmetic operations. While you get most precise results with real data types (like float) for b-spline coefficients, you are free to operate with integral b-spline coefficients: per default, all internal calculations (prefiltering and evaluation) are done with suitably 'promoted' data types, so when using integral coefficients, you only suffer from quantization errors. You have freedom of choice here: your original data, the b-spline coefficients, the type used for internal calculations and the result type of the evaluation may all be different, while there are sensible defaults if you don't want to concern yourself with the details. The only constraint is that arithmetics have to be done with a real data type. Note that prefiltering is highly optimized: the operation is multithreaded automatically, and it is also vectorized, provided that the data types you're using can be handled by your processor's vector units. Even when you aren't using Vc, prefiltering will use the CPU's vector units via a process I call 'goading': The data are presented in small aggregates of vector-friendly size, and with the filter code being reasonably trivial, the operation is auto-vectorized if permitted by the given compiler flags (like -mavx). Whichever way of producing the coefficients you may use, eventually you'll have a 'bspline' object holding the coefficients in a form which can be used by vspline's evaluation mechanism. Evaluation involves two steps: - first you create an 'evaluator' object. This is a functor which is created from a bspline object and provides methods to evaluate the spline. - next you use the evaluator to obtain interpolated values. Yet again you might ask why there isn't simply an evaluation method in class bspline itself. This is due to the fact that evaluation can be done in different ways, which are specified when the 'evaluator' object is created. And when I say 'evaluator' here, I use the term for a whole set of objects capable of evaluating a b-spline, where the capabilities may go beyond what class vspline::evaluator provides. Consider, for example, the factory functions make_evaluator() and make_safe_evaluator() in eval.h. They create functors which contain vspline::evaluator objects specialized for the task at hand, and extra arithmetics on the input for 'safe' evaluators - and they return the functors as type-erased objects in a common type. Once an evaluator is ready, it can be used without further specialization: it's immutable after creation and constitutes a 'pure' functor in a functional-programming sense. It is an object which can be passed around (it's trivially copyable) even between several threads. Creating an evaluator is a cheap operation, so the strategy is to create an evaluator specific to the task at hand and use this specific evaluator for the specific task only. Evaluator objects are also lightweight, so having many evaluators does not use much memory. Evaluators refer to the coefficients, and having an evaluator for a given b-spline object does not automatically keep the coefficients 'alive'. If you destroy the bspline object, evaluators referring to the coefficients hold dangling pointers and using them will produce undefined behaviour. It's your responsibility to keep the bspline object alive until the coefficients won't be used anymore. vspline uses RAII, and the way to create evaluators fits this scheme. Typically you'd create the evaluator in some inner scope just before it is used, letting it perish just afterwards when leaving the scope. There is a third aspect to the strategy proposed by vspline. While the obvious way of using an evaluator would be to write a loop in which the evaluator is called repeatedly, this is only a good idea if performance is not an issue. If, on the other hand, you have large amounts of data to process, you want to employ methods like multithreading and vectorization to get better performance. You are of course free to use whatever multithreading and vectorization code you like, but you might as well use what vspline offers: I call it 'wielding' code. In an evaluation context, it means applying functors to large data sets. vspline offers code for the purpose which automatically multithreads and vectorizes the operation of evaluating a b-spline at many locations. You can pass an array full of coordinates and receive an array of values, an operation called 'remapping'. vspline even goes further and uses an abstraction of this process: it has code to feed large amounts of data to a functor, where the effect is that of processing all data in turn, while the operation is in fact multithreaded and vectorized automatically. The result is stored in a (possibly multidimensional) array. A 'classic' remap is merely a special case of this procedure, which, in vspline, is done by vspline::transform(). When coding geometric transformations, the task can often be reduced to formulating a function which, given a 'target' coordinate - a coordinate into the result array - can produce a 'source' coordinate, namely the locus of the interpolation. For such situations, vspline offers an overload of vspline::transform which feeds the functor with discrete target coordinates. If the functor is merely an evaluator, the result will be a 'restoration' of the original data. But if you use a functor transforming the 'incoming' coordinate and feeding the transformed coordinate to en evaluator, the result will be a geometrically transformed image, volume, etc... Creating such functors is actually quite simple, and the examples have code to show how it's done. Have a look at gsm.cc, which produces an image holding the gradient squared magnitude of an input image. There, the target pixels are created one by one in a loop. Now look at gsm2.cc. The result is the same, but here it is obtained by coding the 'interesting' bit as a functor derived from vspline::unary_functor and the coordinate-fed overload of vspline::transform is used. You can see that the coding is quite straightforward. What is not immediately apparent is the fact that doing it this way is also much faster: the operation is automatically multithreaded and, possibly, vectorized. For a single image, the difference is not a big deal, but when you're producing images for real-time display, every millisecond counts. My use of functors in vspline isn't as straightforward as using, say, std::function. This is due to the fact that the functors used in vspline can process both unvectorized and vectorized data. The relevant 'eval' routines provide the dual functionality either as two distinct overloads or as a template if the operation can be coded as one. This is why I had to code my own version of the subset of functional programming constructs used in vspline, rather than using some ready-made expression template code. There is one last step which I'll mention briefly. Suppose you have a bspline object and you want to recreate your original data from the coefficients it holds. vspline offers functions called 'restore' for the purpose. Restoration will only be possible within arithmetic precision - if you are using integral data, you may suffer from quantization errors, with real data, the restored data will be very close to the original data. Restoration is done with a separable convolution, which is multithreaded and vectorized automatically, just like prefiltering or transformations, so it's also very fast. vspline's prefiltering code, which is simply a forward-backward recursive n-pole filter, and the convolution code used for restoration, can be used to apply other filters than the b-spline prefilter or restoration, with the same benefits of multithreading and vectorization, requiring little more than the specification of the filter itself, like the convolution kernel to use. And vspline's 'range-based multithreading' is also commodified so that you can use it for your own purposes with just a few lines of code, see a simple example in eval.cc, where the speed test is run with a thread pool to get a measurement of the overall evaluation performance. How about processing 1D data like sounds or simple time series? vspline has specialized code for the purpose. While simply iterating over the 1D sequence is trivial, it's slow. So vspline's specialized 1D code uses mathematical tricks to process the 1D data with multiple threads and vectorization just as images or volumes. These specializations exist for prefiltering, transformations and restoration, which are therefore just as fast as for nD data, and, as far as the filters are concerned, even faster, since the filter has to be applied along one axis only. As a user, again you don't have to 'do' anything, the 1D code is used automatically if your data are 1D. The mathematical trickery only fails if you require 'zero tolerance' for the coefficient generation, which forces the data to be prefiltered en bloc - an option which only makes sense in rare circumstances. vspline will accept high-dimension data, but when dimensionality gets very high, vspline's use of 'braced' coefficient arrays has some drawbacks: with increasing dimensionality, the 'surface' of an array grows very large, and the 'brace' around the coefficients is just such a surface, possibly several units wide. So with high-D splines, the braced coefficient array is (potentially much) larger than the knot point data. You've been warned ;) An important specialization is code for degree-0 and degree-1 b-splines. These are old acquaintances in disguise: a degree-0 b-spline is simply another name for a nearest-neigbour interpolation, and it's degree-1 cousin encodes linear interpolation. While the 'general-purpose' b-spline evaluation code can be used to evaluate degree-0 and degree-1 b-splines, doing so wastes some potential for optimization. vspline's evaluator objects can be 'specialized' for degree-0 and degree-1 splines, and when you're using the factory functions like make_evaluator, this is done automatically. With the optimizations for low-degree splines and 1D data, I feel that the 'problem' at hand - bspline processing - is dealt with exhaustively, leaving no use scenario with second-best solutions but providing good performance throughout. \section quickstart_sec Quickstart vspline uses vigra to handle data. There are two vigra data types which are used throughout: vigra::MultiArrayView is used for multidimensional arrays. It's a thin wrapper around the three parameters needed to describe arbitrary n-dimensional arrays of data in memory: a pointer to some 'base' location coinciding with the coordinate origin, a shape and a stride. If your code does not use vigra MultiArrayViews, it's easy to create them for the data you have: vigra offers a constructor for MultiArrayViews taking these three parameters. Very similar is vigra's MultiArray, which is is a MultiArrayView owning and allocating the data it refers to. The other vigra data type used throughout vspline is vigra::TinyVector, a small fixed-size container type used to hold things like multidimensional coordinates or pixels. This type is also just a wrapper around a small 1D C array. It's zero overhead and contains nothing else, but offers lots of functionality like arithmetic operations. I recommend looking into vigra's documentation to get an idea about these data types, even if you only wrap your extant data in them to interface with vspline. vspline follows vigra's default axis ordering scheme: the fastest-varying index is first, so coordinates are (x,y,z...). This is important to keep in mind when evaluating a b-spline. Always pass the x coordinate first. Coordinates, strides and shapes are given in units of to the MultiArrayView's value_type. So if you have a MultiArrayView of, say, float RGB pixels, a stride of one means an offset of ( 3 * sizeof ( float ) ) in bytes. This is important to note, since there is no standard for expressing strides - some systems use bytes, some use the elementary type. vigra uses the array's value_type for the unit step. If you stick with the high-level code, using class bspline or the transform functions, most of the parametrization is easy. Here are a few quick examples what you can do. This is really just to give you a first idea - there is example code covering most features of vspline where things are covered in more detail, with plenty of comments. the code in this text is also there, see quickstart.cc. Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle integer and real data like float and double, and also their 'aggregates', meaning data types like pixels or vigra's TinyVectors. But for now, let's assume you have plain float data. Creating the bspline object is easy: ~~~~~~~~~~~~~~ #include // given a 10 X 20 vigra::MultiArray of data vigra::MultiArray < 2 , float > a ( 10 , 20 ) ; // let's initialize the whole array with 42 a = 42 ; // fix the type of the corresponding b-spline. We want a bspline // object with float coefficients and two dimensions: typedef vspline::bspline < float , 2 > spline_type ; // create bspline object 'bspl' fitting the shape of your data spline_type bspl ( a.shape() ) ; // copy the source data into the bspline object's 'core' area bspl.core = a ; // run prefilter() to convert original data to b-spline coefficients bspl.prefilter() ; ~~~~~~~~~~~~~~ The memory needed to hold the coefficients is allocated when the bspline object is constructed. Obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. The spline is 'braced' so that it can be evaluated with vspline's evaluation routines, and the process is automatically partitioned and run in parallel by a thread pool. The only mandatory template arguments are the value type and the number of dimensions, which have to be known at compile time. While the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you would use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this where the bspline object provides the array and you interface it via a view to it's 'core': ~~~~~~~~~~~~~~ vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ; auto v1 = bsp.core ; // get a view to the bspline's 'core' for ( auto & r : v1 ) r = ... ; // assign some values bsp.prefilter() ; // perform the prefiltering ~~~~~~~~~~~~~~ This is a common idiom, because it reflects a common mode of operation where you don't need the original, unfiltered data any more after creating the spline, so the prefiltering is done in-place overwriting the original data. If you do need the original data later, you can also use a third idiom: ~~~~~~~~~~~~~~ vigra::MultiArrayView < 3 , double > my_data ( vigra::Shape3 ( 5 , 6 , 7 ) ) ; vspline::bspline < double , 3 > bsp ( my_data.shape() ) ; bsp.prefilter ( my_data ) ; ~~~~~~~~~~~~~~ Here, the bspline object is first created with the appropriate 'core' size, and prefilter() is called with an array matching the bspline's core. This results in my_data being read into the bspline object during the first pass of the prefiltering process. There are more ways of setting up a bspline object, please refer to class bspline's constructor. Of course you are also free to directly use vspline's lower-level routines to create a set of coefficients. The lowest level of filtering routine is simply a forward-backward recursive filter with a set of arbitrary poles. This code is in prefilter.h. Next you may want to evaluate the spline from the first example at some pair of coordinates x, y. Evaluation of the spline can be done using vspline's 'evaluator' objects. Using the highest level of access, these objects are set up with a bspline object and, after being set up, provide methods to evaluate the spline at given coordinates. Technically, evaluator objects are functors which don't have mutable state (all state is created at creation time and remains constant afterwards), so they are thread-safe and 'pure' in a functional-programming sense. The evaluation is done by calling the evaluator's eval() member function, which takes it's first argument (the coordinate) as a const reference and writes the result to it's second argument, which is a reference to a variable capable of holding the result. ~~~~~~~~~~~~~~ // for a 2D spline, we want 2D coordinates typedef vigra::TinyVector < float , 2 > coordinate_type ; // get the appropriate evaluator type typedef vspline::evaluator < coordinate_type , float > eval_type ; // create the evaluator eval_type ev ( bspl ) ; // create variables for input and output, coordinate_type coordinate ( 3 , 4 ) ; float result ; // use the evaluator to evaluate the spline at ( 3 , 4 ) // storing the result in 'result' ev.eval ( coordinate , result ) ; ~~~~~~~~~~~~~~ Again, some things have happened by default. The evaluator was constructed from a bspline object, making sure that the evaluator is compatible. Class evaluator does provide operator(). So do all the other functors vspline can generate. This is for convenience, so you can use vspline's unary_functors like a 'normal' function: ~~~~~~~~~~~~~ // evaluators can be called as a function auto r = ev ( coordinate ) ; assert ( r == result ) ; ~~~~~~~~~~~~~ vspline offers a limited set of functional programming constructs - as of this writing, it provides just those constructs which it uses itself, usually in factory functions. You can find the functional constructs in unary_functor.h. While it's tempting to implement more along the lines of expression templates, I have tried to keep things limited to a comfortable minimum. Most of the time user code may remain ignorant of the functional programming aspects - the functional constructs obtained from the factory functions can just be assigned to auto variables; it's usually not necessary to make these types explicit. What about the remap function? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap does. Instead of a single coordinate, you pass a whole vigra::MultiArrayView full of coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every coordinate in the first array. Here's a simple example, using the same array 'a' as above: ~~~~~~~~~~~~ // create a 1D array containing three (2D) coordinates vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ; // we initialize the coordinate array by hand... coordinate_array = coordinate ; // create an array to accommodate the result of the remap operation vigra::MultiArray < 1 , float > target_array ( 3 ) ; // perform the remap vspline::remap ( a , coordinate_array , target_array ) ; // reassure yourself the result is correct auto ic = coordinate_array.begin() ; for ( auto k : target_array ) assert ( k == ev ( *(ic++) ) ; ~~~~~~~~~~~~ This is an 'ad-hoc' remap, passing source data as an array. You can also set up a bspline object and perform a 'transform' using an evaluator for this bspline object, with the same effect: ~~~~~~~~~~~~ // instead of the remap, we can use transform, passing the evaluator for // the b-spline over 'a' instead of 'a' itself. the result is the same. vspline::transform ( ev , coordinate_array , target_array ) ; ~~~~~~~~~~~~ This routine has wider scope: while in this example, ev is a b-spline evaluator, ev's type can be any functor capable of yielding a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be any data type suitable for processing with vspline, you're not limited to things which can be thought of as 'coordinates' etc. This generalization of remap is named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called vspline::apply. There is one variation of transform(). This overload doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in. It's probably easiest to understand this variant if you start out thinking of feeding the previous transform() with an array which contains discrete indices. In 2D, this array would contain
(0,0) , (1,0) , (2,0) ... (0,1) , (1,1) , (2,1) ... ...
So why would you set up such an array, if it merely contains the coordinates of every cell? You might as well create these values on-the-fly and omit the coordinate array. This is precisely what the second variant of transform does: ~~~~~~~~~~~~~ // create a 2D array for the result of the index-based transform operation vigra::MultiArray < 2 , float > target_array_2d ( 3 , 4 ) ; // use transform to evaluate the spline for the coordinates of // all values in this array vspline::transform ( ev , target_array_2d ) ; // verify for ( int x = 0 ; x < 3 ; x ++ ) { for ( int y = 0 ; y < 4 ; y++ ) { coordinate_type c { x , y } ; assert ( target_array_2d [ c ] == ev ( c ) ) ; } } ~~~~~~~~~~~~~ If you use this variant of transform directly with a vspline::evaluator, it will reproduce your original data - within arithmetic precision of the evaluation. While this is one way to restore the original data, there's also a (more efficient) routine called 'restore'. ~~~~~~~~~~~~~ // use an index-based transform to restore the original data to 'b' vigra::MultiArray < 2 , float > b ( a.shape() ) ; vspline::transform ( ev , b ) ; // confirm the restoration has succeeded auto ia = a.begin() ; for ( auto r : b ) assert ( vigra::closeAtTolerance ( *(ia++) , r , .00001 ) ) ; // now use vspline::restore to restore the original data into 'c' vigra::MultiArray < 2 , float > c ( a.shape() ) ; vspline::restore ( bspl , c ) ; // confirm that both methods produced similar results auto ib = b.begin() ; for ( auto & ic : c ) assert ( vigra::closeAtTolerance ( *(ib++) , ic , .00001 ) ) ; ~~~~~~~~~~~~~ Class vspline::unary_functor is coded to make it easy to implement functors for things like image processing pipelines. For more complex operations, you'd code a functor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use transform() to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's transform code does the tedious work of multithreading, deinterleaving and interleaving for you, while you are left to concentrate on the interesting bit, writing the processing pipeline code. vspline::unary_functors are reasonably straightforward to set up; you can start out with scalar code, and you'll see that writing vectorized code with Vc isn't too hard either - if your code doesn't need conditionals, you can often even get away with using the same code for vectorized and unvectorized operation by simply making 'eval' a function template. Please refer to the examples. vspline offers some functional programming constructs for functor combination, like feeding one functor's output as input to the next (vspline::chain) or translating coordinates to a different range (vspline::domain). And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arbitrarily shaped and strided nD arrays of data. If you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level objects 'bspline' and 'evaluator' and the transform-like functions. While one might argue that the remap/transform routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that I can only make them really fast by tightly coupling them with the b-spline code. And the hardware can be exploited fully only by processing several values at once (by multithreading and vectorization). \section speed_sec Speed While performance will vary from system to system and between different compile(r)s, I'll quote some measurements from my own system. I include benchmarking code (like roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
testing bc code MIRROR spline degree 3 using SIMD emulation avg 32 x prefilter:............................ 12.93750 ms avg 32 x restore original data from 1D:........ 6.03125 ms avg 32 x transform with ready-made bspline:.... 46.18750 ms avg 32 x restore original data: ............... 15.90625 ms testing bc code MIRROR spline degree 3 using Vc avg 32 x prefilter:............................ 13.12500 ms avg 32 x restore original data from 1D:........ 5.50000 ms avg 32 x transform with ready-made bspline:.... 21.40625 ms avg 32 x restore original data: ............... 10.15625 ms
As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higher-degree splines doesn't take much more time:
testing bc code MIRROR spline degree 5 using Vc avg 32 x prefilter:............................ 13.59375 ms testing bc code MIRROR spline degree 7 using Vc avg 32 x prefilter:............................ 15.00000 ms
Using double precision arithmetics, vectorization doesn't help so much. Note that it is entirely possible to work in long double, but since these operations can't be vectorized in hardware, this is slow. vspline will - by default - use vector code for all operations - if hardware vectorization can't be used, vspline will use code to emulate the hardware vectorization, and use data types which are compatible with 'proper' SIMD data. This way, having stages in a processing pipeline using types which can't be vectorized in hardware is no problem and will not force the whole pipeline to be run in scalar code. To have vspline use scalar code, you can fix the vectorization width at 1, and at times this may even produce faster code. Again you'll have to try out what best suits your needs. \section design_sec Design You can probably do everything vspline does with other software - there are several freely available implementations of b-spline interpolation and remap/transform routines. What I wanted to create was an implementation which was as general as possible and at the same time as fast as possible, and, on top of that, comprehensive and easy to use. These demands are not easy to satisfy at the same time, but I feel that my design comes close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was from multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by independent threads. You can see the partitioning both in prefiltering and later in the transform routines. Another speedup method is data-parallel processing. This is often thought to be the domain of GPUs, but modern CPUs also offer it in the form of vector units. I chose implementing data-parallel processing in the CPU, as it offers tight integration with unvectorized CPU code. It's almost familiar terrain, and the way from writing conventional CPU code to vector unit code is not too far. Using horizontal vectorization does require some rethinking, though - mainly a conceptual shift from an AoS to an SoA approach. vspline doesn't use vertical vectorization at all, so the code may look odd to someone looking for vector representations of, say, pixels: instead of finding SIMD vectors with three elements, there are structures of three SIMD vectors of vsize elements. I chose to code so that vectorization is manifest in the program's structure. The specific mode of vectorization - emulation or explicit vectorization - can be chosen at compile time: while I implemented the vectorized code, I noticed that vectorization is, to a high degree, something that expresses itself in the code's structure: the data have to be 'presented' as SoA of vector-friendly size. If this is done, use of explicit vector code may not even be necessary: the structure of the data flow implies vectorization, and if the implicit vectorization is expressed in a way the compiler can 'understand', it will result in vector code produced by autovectorization. To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved memory. Keeping the data in interleaved memory is even desirable, because it preserves locality, and usually processing accesses all parts of a value (i.e. all three channels of an RGB value) at once. After the evaluation is complete, data have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on deinterleaved, vectorized data and finally store the result back to interleaved memory. The sequence of operations performed on the vectorized data constitute a processing pipeline, and some data access code will feed the pipeline and dispose of it's result. vspline's unary_functor class is designed to occupy the niche of pipeline code, while remap, apply and transform provide the feed-and-dispose code - a task which I like to call 'wielding'. So with the framework of these routines, setting up vectorized processing pipelines becomes easy, since all the boilerplate code is there already, and only the point operations/operations on single vectors need to be provided by deriving from unary_functor. Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via transform from a precalculated coordinate array. On my system, I have almost reached that goal - my transform times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main testing ground is pv, my panorama viewer. Here I can often take the spline degree up to two (a quadratic spline) and still have smooth animation in full HD. Using pv has another benefit: it makes it possible to immediately *see* the results of vspline's operation. If anything is amiss, it'll likely be visible. Even without using Vc, the code is certainly fast enough for most purposes. This way, vigra becomes the only dependency. \section Literature There is a large amount of literature on b-splines available online. Here's a pick: http://bigwww.epfl.ch/thevenaz/interpolation/ http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-ex-1.html */ kfj-vspline-4b365417c271/eval.h000066400000000000000000002535431333775006700160270ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file eval.h \brief code to evaluate uniform b-splines This body of code contains class evaluator and auxilliary classes which are needed for it's smooth operation. The evaluation is a reasonably straightforward process: A subset of the coefficient array, containing coefficients 'near' the point of interest, is picked out, and a weighted summation over this subset produces the result of the evaluation. The complex bit is to have the right coefficients in the first place (this is what prefiltering does), and to use the appropriate weights on the coefficient window. For b-splines, there is an efficient method to calculate the weights by means of a matrix multiplication, which is easily extended to handle b-spline derivatives as well. Since this code lends itself to a generic implementation, and it can be parametrized by the spline's order, and since the method performs well, I use it here in preference to the code which P. Thevenaz uses (which is, for the orders of splines it encompasses, the matrix multiplication written out with a few optimizations, like omitting multiplications with zero, and slightly more concise calculation of powers). The weight generation code is in basis.h. Evaluation of a b-spline seems to profit more from vectorization than prefiltering, especially for float data. On my system, I found single-precision operation was about three to four times as fast as unvectorized code (AVX2). The central class of this file is class evaluator. evaluator objects are set up to provide evaluation of a specific b-spline. Once they are set up they don't change and effectively become pure functors. The evaluation methods typically take their arguments per reference. The details of the evaluation variants, together with explanations of specializations used for extra speed, can be found with the individual evaluation routines. 'Ordinary' call syntax via operator() is also provided for convenience. What do I mean by the term 'pure' functor? It's a concept from functional programming. It means that calling the functor will not have any effect on the functor itself - it can't change once it has been constructed. This has several nice effects: it can potentially be optimized very well, it is thread-safe, and it will play well with functional programming concepts - and it's conceptually appealing. Code using class evaluator will probably use it at some core place where it is part of some processing pipeline. An example would be an image processing program: one might have some outer loop generating arguments (typically simdized types) which are processed one after the other to yield a result. The processing will typically have several stages, like coordinate generation and transformations, then use class evaluator to pick an interpolated intermediate result, which is further processed by, say, colour or data type manipulations before finally being stored in some target container. The whole processing pipeline can be coded to become a single functor, with one of class evaluator's eval-type routines embedded somewhere in the middle, and all that's left is code to efficiently handle the source and destination to provide arguments to the pipeline - like the code in transform.h. And since this code is made to provide the data feeding and storing, the only coding needed is for the pipeline, which is where the 'interesting' stuff happens. class evaluator is the 'front' class for evaluation, the implementation of the functionality is in class inner_evaluator. User code will typically not use class inner_evaluator, which lives in namespace vspline::detail. The code in this file concludes by providing factory functions to obtain evaluators for vspline::bspline objects. These factory functions produce objects which are type-erased (see vspline::grok_type) wrappers around the evaluators, which hide what's inside and simply provide evaluation of the spline at given coordinates. These objects also provide operator(), so that they can be used like functions. Passing vectorized coordinates results in vectorized results, where the specific types for the vectorized input and output is gleaned from vspline::vector_traits. If Vc is used and the fundamental data types can be made into a Vc::SimdArray, the data types for vectorized input/output will be Vc::SimdArrays (or vigra::TinyVectors of Vc::SimdArrays for multichannel data). Otherwise, vspline's SIMD emulation will be used, which replaces Vc::SimdArray with vspline::simd_tv, which is an autovectorization-friendly type with similar performance. Since the objects produced by the factory functions are derived from vspline::unary_functor, they can be fed to the functions in transform.h, like any other vspline::unary_functor. If you use vspline::transform and relatives, vectorization is done automatically: the transform routines will inquire for the functor's vectorized signature which is encoded as it's in_v and out_v types, which are - normally - results of querying vspline::vector_traits. The data are then deinterleaved into the vectorized input type, fed to the functor, and the vectorized result is interleaved into target memory. class evaluator has all the relevant attributes and capabilites, so using it with transform and relatives is easy and you need not be aware of any of the 'vector magic' going on internally - nor of the automatic multithreading. See transform.h for more on the topic. */ #ifndef VSPLINE_EVAL_H #define VSPLINE_EVAL_H #include "basis.h" #include "bspline.h" #include "unary_functor.h" #include "map.h" namespace vspline { // we define the default fundamental type used for mathematical operations. // starting out with the 'better' of coordinate_type's and value_type's // elementary type, we take this type's RealPromote type to make sure we're // operating in some real type. With a real math_type we can operate on // integral coefficients/values and only suffer from quantization errors, // so provided the dynamic range of the integral values is 'sufficiently' // large, this becomes an option - as opposed to operating in an integral // type which is clearly not an option with the weights being in [0..1]. template < typename coordinate_type , typename value_type > using default_math_type = typename vigra::NumericTraits < typename vigra::PromoteTraits < typename vigra::ExpandElementResult < coordinate_type > :: type , typename vigra::ExpandElementResult < value_type > :: type > :: Promote > :: RealPromote ; namespace detail { /// 'inner_evaluator' implements evaluation of a uniform b-spline. /// While class evaluator (below, after namespace detail ends) /// provides objects derived from vspline::unary_functor which /// are meant to be used by user code, here we have a 'workhorse' /// object to which 'evaluator' delegates. /// /// The template arguments are, first, the elementary types (e.t.) /// of the types involved, then several non-type template arguments /// fixing aggregate sizes. inner_evaluator only uses 'synthetic' /// types devoid of any specific meaning, so while class evaluator /// might accept types like 'std::complex' or 'double' /// class inner_evaluator would instead accept the synthetic types /// vigra::TinyVector and vigra::TinyVector. /// /// Note the name system used for the types. first prefixes: /// - ic: integral coordinate /// - rc: real coordinate /// - ofs: offset in memory /// - cf: b-spline coefficient /// - math: used for calculations /// - trg: type for result ('target') of the evaluation /// an infix of 'ele' refers to a type's elementary type. /// suffix 'type' is for unvectorized types, while suffix 'v' /// is used for simdized types, see below. /// /// User code will not usually create and handle objects of class /// inner_evaluator, but to understand the code, it's necessary to /// know the meaning of the template arguments for class inner_evaluator: /// /// - _ic_ele_type: e.t. of integral coordinates. integral coordinates occur /// when incoming real coordinates are split into an integral part and a /// remainder. Currently, only int is used. /// /// - _rc_ele_type: e.t. of real coordinates. This is used for incoming real /// coordinates and the remainder mentioned above /// /// - _ofs_ele_type: e.t. for offsets (in memory). This is used to encode /// the location of specific coefficients relative to the coefficients' /// origin in memory. Currently, only int is used. /// /// - _cf_ele_type: elementary type of b-spline coefficients. While in most /// cases, this type will be the same as the elementary type of the knot /// point data the spline is built over, it may also be different. See /// class bspline's prefilter method. /// /// - _math_ele_type: e.t. for mathematical operations. All arithmetic /// inside class inner_evaluator is done with this elementary type. /// It's used for weight generation, coefficients are cast to it, and /// only after the result is complete, it's cast to the 'target type' /// /// - _trg_ele_type: e.t. of target (result). Since class inner_evaluator /// normally receives it's 'target' per reference, this template argument /// fixes the target's type. This way, after the arithmetic is done, the /// result is cast to the target type and then assigned to the target /// location. /// /// - _dimension: number of dimensions of the spline, and therefore the /// number of components in incoming coordinates. If the spline is 1D, /// coordinates will still be contained in vigra::TinyVectors, with /// only one component. /// /// - _channels: number of channels per coefficient. So, when working on /// RGB pixels, the number of channels would be three. If the spline /// is over fundamentals (float, double...), _channels is one and the /// type used here for coefficients is a vigra::TinyVector with one /// element. /// /// - _specialize: class inner_evaluator has specialized code for /// degree-0 and degree-1 b-splines, aka nearest neighbour and linear /// interpolation. This specialized code can be activated by passing /// 0 or 1 here, respectively. All other values will result in the use /// of general b-spline evaluation, which can handle degree-0 and /// degree-1 splines as well, but less efficiently. template < typename _ic_ele_type , // e.t. of integral coordinates typename _rc_ele_type , // e.t. of real coordinates typename _ofs_ele_type , // e.t. for offsets (in memory) typename _cf_ele_type , // elementary type of coefficients typename _math_ele_type , // e.t. for mathematical operations typename _trg_ele_type , // e.t. of target (result) unsigned int _dimension , // dimensionality of the spline unsigned int _channels , // number of channels per coefficient int _specialize // specialize for NN, linear, general > struct inner_evaluator { // make sure math_ele_type is a floating point type static_assert ( std::is_floating_point < _math_ele_type > :: value , "class evaluator requires a floating point math_ele_type" ) ; // pull in the template arguments in order to allow other code to // inquire about the type system typedef _ic_ele_type ic_ele_type ; typedef _rc_ele_type rc_ele_type ; typedef _ofs_ele_type ofs_ele_type ; typedef _cf_ele_type cf_ele_type ; typedef _math_ele_type math_ele_type ; typedef _trg_ele_type trg_ele_type ; enum { dimension = _dimension } ; enum { level = dimension - 1 } ; enum { channels = _channels } ; enum { specialize = _specialize } ; // define the 'synthetic' unvectorized types, which are always // TinyVectors, possibly with only one element. Note how this // process of 'synthesizing' the types is in a way the opposite // process of what's done in class evaluator, where the template // arguments are 'taken apart' to get their elementary types. typedef vigra::TinyVector < ic_ele_type , dimension > ic_type ; typedef vigra::TinyVector < rc_ele_type , dimension > rc_type ; typedef ofs_ele_type ofs_type ; // TODO: superfluous? typedef vigra::TinyVector < cf_ele_type , channels > cf_type ; typedef vigra::TinyVector < math_ele_type , channels > math_type ; typedef vigra::TinyVector < trg_ele_type , channels > trg_type ; typedef vigra::TinyVector < std::ptrdiff_t , dimension > shape_type ; typedef vigra::TinyVector < int , dimension > derivative_spec_type ; private: typedef typename vigra::MultiArrayView < 1 , const cf_ele_type * > :: const_iterator cf_pointer_iterator ; /// Initially I was using a template argument for this flag, but it turned out /// that using a const bool set at construction time performs just as well. /// Since this makes using class evaluator easier, I have chosen to go this way. const bool even_spline_degree ; /// memory location and layout of the spline's coefficients. Note that the /// pointer points to the elementary type, and the stride is given in units /// of the elementary type as well (hence the 'e' after the underscore). const cf_ele_type * const cf_ebase ; const shape_type cf_estride ; /// cf_pointers holds the sum of cf_ebase and window offsets. This produces a small /// performance gain: instead of passing the coefficients' base address (cf_ebase) /// and the series of offsets (cf_offsets) into the workhorse evaluation code and /// adding successive offsets to cf_ebase, we do the addition in the constructor /// and save the offsetted pointers. vigra::MultiArray < 1 , std::ptrdiff_t > cf_offsets ; vigra::MultiArray < 1 , const cf_ele_type * > cf_pointers ; /// To perform the weighted summation over a set of coefficients, we need to /// access the b-spline basis function. basis.h has class 'basis_functor' which /// has an operator() overload to provide just the sets of weights needed for /// evaluation with a given delta, delta being the small remainder left after /// splitting coordinates. There may be a different basis functor for each /// dimension (namely when derivatives are calculated), so we keep a TinyVector /// of basis_functors, one for each dimension. /// Note how, by instantiating basis_functor with math_ele_type, we specify that /// the weight generation is to be done with maths using this type, not with /// basis_functor's default math_type, which would be the most precise type /// vspline can provide: the most precise type is currently 'long double' and /// operating in long double would slow everything down a lot. typedef typename vspline::basis_functor < math_ele_type > bf_type ; vigra::TinyVector < bf_type , dimension > wgt ; public: const int spline_degree ; const int spline_order ; /// size of the window of coefficients contributing to a single evaluation. /// This equals 'spline_order' to the power of 'dimension'. const int window_size ; /// split function. This function is used to split incoming real coordinates /// into an integral and a remainder part, which are used at the core of the /// evaluation. selection of even or odd splitting is done via the const bool /// flag 'even_spline_degree'. My initial implementation had this flag as a /// template argument, but this way it's more flexible and there seems to /// be no runtime penalty. This method delegates to the free function templates /// even_split and odd_split, respectively, which are defined in basis.h. template < class IT , class RT > void split ( const RT& input , IT& select , RT& tune ) const { if ( even_spline_degree ) even_split ( input , select , tune ) ; else odd_split ( input , select , tune ) ; } const int & get_order() const { return spline_order ; } const int & get_degree() const { return spline_degree ; } const shape_type & get_estride() const { return cf_estride ; } /// inner_evaluator only has a single constructor, which takes these arguments: /// /// - _cf_ebase: pointer to the origin of the coefficient array, expressed /// as a pointer to the coefficients' elementary type. 'origin' here means /// the memory location coinciding with the origin of the knot point data, /// which coincides with a bspline object's 'core', not the origin of a /// bspline object's 'container'. Nevertheless, the data have to be suitably /// 'braced' - evaluation may well fail (spectacularly) if the brace is /// absent, please refer to class bspline's documentation. /// /// - _cf_estride: the stride(s) of the coefficient array, expressed in units /// of the coefficients' elementary type. /// /// - _spline_degree: the degree of the b-spline. this can be up to 45 /// currently. See the remarks on 'shifting' in the documentation of class /// evaluator below. /// /// - _derivative: pass values other than zero here for an axis for which you /// want the derivative. Note that passing non-zero values for several axes /// at the same time will likely not give you the result you intend: The /// evaluation proceeds from the highest dimension to the lowest (z..y..x), /// obtaining the weights for the given axis by calling the basis_functor /// assigned to that axis. If any of these basis_functor objects provides /// weights to calculate a derivative, subsequent processing for another /// axis with a basis_functor yielding weights for derivatives would calculate /// the derivative of the derivative, which is not what you would normally want. /// So for multidimensional data, use a derivative specification only for one /// axis. If necessary, calculate several derivatives separately (each with /// their own evaluator), then multiply. See gsm.cc for an example. inner_evaluator ( const cf_ele_type * const _cf_ebase , const shape_type & _cf_estride , int _spline_degree , derivative_spec_type _derivative = derivative_spec_type ( 0 ) ) : cf_ebase ( _cf_ebase ) , cf_estride ( _cf_estride ) , spline_degree ( _spline_degree ) , even_spline_degree ( ! ( _spline_degree & 1 ) ) , spline_order ( _spline_degree + 1 ) , wgt { bf_type ( _spline_degree , 0 ) } , window_size ( std::pow ( _spline_degree + 1 , int(dimension) ) ) { // if any derivatives are used, reinitalize the weight functors where // a derivative other than 0 is requested. TODO potentially slightly wasteful, // if no derivative-0 evaluation is done or duplicate weight functors are produced. // But setting up the weight functors is only expensive for high derivatives, // so I consider this a corner case and ignore it for now. for ( int d = 0 ; d < dimension ; d++ ) { if ( _derivative[d] ) { // TODO maybe throwing an exception here is too harsh: might slot in // a functor producing 0 for all arguments instead if ( _derivative[d] >= _spline_degree ) throw ( std::invalid_argument ( "derivative must be lower than spline degree" ) ) ; wgt[d] = bf_type ( _spline_degree , _derivative[d] ) ; } } // The evaluation forms a weighted sum over a window of the coefficient array. // The sequence of offsets we calculate here is the set of pointer differences // from the central element in that window to each element in the window. It's // another way of coding this window, where all index calculations have already // been done beforehand rather than performing them during the traversal of the // window by means of stride/shape arithmetic. // we want to iterate over all nD indexes in a window which has equal extent // of spline_order in all directions (like the reconstruction filter's kernel), // relative to a point which is spline_degree/2 away from the origin along // every axis. This sounds complicated but is really quite simple: For a cubic // b-spline over 2D data we'd get // (-1,-1) , (0,-1), (1,-1), (2,-1) // (-1, 0) , (0, 0), (1, 0), (2, 0) // (-1, 1) , (0, 1), (1, 1), (2, 1) // (-1, 2) , (0, 2), (1, 2), (2, 2) // for the indexes, which are subsequently multiplied with the strides and // summed up to obtain 1D offsets instead of nD coordinates. So if the coefficient // array has strides (10,100) and the coefficients are single-channel, the sequence // of offsets generated is // -110, -100, -90, -80, // which is -1 * 10 + -1 * 100, 0 * 10 + -1 * 100 ... // -10, 0, 10, 20, // 90, 100, 110, 120, // 190, 200, 210, 220 shape_type window_shape ( spline_order ) ; vigra::MultiCoordinateIterator mci ( window_shape ) ; // cf_pointers will hold the sums of cf_ebase and the offsets into the window // of participating coefficients. Now there is only one more information // that's needed to localize the coefficient access during evaluation, // namely an additional offset specific to the locus of the evaluation. This is // generated from the integral part of the incoming coordinates during the // evaluation and varies with each evaluation - the DDA (data defined access). // This locus-specific offset originates as the integral part of the coordinate, // (which is an nD integral coordinate) 'condensed' into an offset by multiplying // it with coefficient array's stride and summimg up. During the evaluation, // the coefficients in the window relevant to the current evaluation can now // be accessed by combining two values: a pointer from 'cf_pointers' and the // locus-specific offset. So rather than using a pointer and a set of indexes // we use a set of pointers and an index to the same effect. Why do it this way? // because of vectorization. If we want to process a vector of loci, this is the // most efficient way of coding the operation, since all calculations which do not // depend on the locus are already done here in the constructor, and the vector // of offsets generated from the vector of loci can be used directly as a gather // operand for accessing the coefficients. This gather operand can remain in a // register throughout the entire evaluation, only the base pointer this gather // operand is used with changes in the course of the evaluation. The problem // with this approach is the fact that the vector of offsets is not regular in any // predictable way and may well access memory locations which are far apart. // Luckily this is the exception, and oftentimes access will be to near memory, // which is in cache already. cf_pointers = vigra::MultiArray < 1 , const cf_ele_type * > ( window_size ) ; cf_offsets = vigra::MultiArray < 1 , std::ptrdiff_t > ( window_size ) ; auto ofs_target = cf_offsets.begin() ; auto target = cf_pointers.begin() ; for ( int i = 0 ; i < window_size ; i++ ) { // offsets are calculated by multiplying indexes with the coefficient array's // strides and summing up. So now we have offsets instead of nD indices. // By performing this addition now rather than passing in both the pointer and // the offsets, we can save a few cycles and reduce register pressure. // Note how we subtract spline_degree/2 to obtain indexes which are relative // to the window's center. Annoying aside: the subtraction of spline_degree/2 // from *mci yields double (vigra's result type for the right scalar // subtraction which accepts only double as the scalar), so the product with // cf_estride and the result of 'sum' are also double. Hence I can't code // 'auto offset'. We keep a record of the offset and of it's sum with cf_ebase, // so we can choose which 'flavour' we want 'further down the line' std::ptrdiff_t offset = sum ( ( *mci - spline_degree / 2 ) * cf_estride ) ; *ofs_target = offset ; *target = cf_ebase + offset ; // increment the iterators ++mci ; ++ofs_target ; ++target ; } } /// obtain_weights calculates the weights to be applied to a section /// of the coefficients from the fractional parts of the split coordinates. /// What is calculated here is the evaluation of the spline's basis function /// at dx, dx+/-1 ... but doing it naively is computationally expensive, /// as the evaluation of the spline's basis function at arbitrary values has /// to look at the value, find out the right interval, and then calculate /// the value with the appropriate function. But we always have to calculate /// the basis function for *all* intervals anyway, and the method used here /// performs this tasks efficiently using a vector/matrix multiplication. /// If the spline is more than 1-dimensional, we need a set of weights for /// every dimension. The weights are accessed through a 2D MultiArrayView. /// For every dimension, there are spline_order weights. Note that this /// code will process unvectorized and vectorized data alike - hence the /// template arguments. /// note that wgt[axis] contains a 'basis_functor' object (see basis.h) /// which encodes the generation of the set of weights. template < typename nd_rc_type , typename weight_type > void obtain_weights ( vigra::MultiArrayView < 2 , weight_type > & weight , const nd_rc_type & c ) const { auto ci = c.cbegin() ; for ( int axis = 0 ; axis < dimension ; ++ci , ++axis ) wgt[axis] ( weight.data() + axis * spline_order , *ci ) ; } /// obtain weight for a single axis template < typename rc_type , typename weight_type > void obtain_weights ( weight_type * p_weight , const int & axis , const rc_type & c ) const { wgt[axis] ( p_weight , c ) ; } private: // next we have collateral code which we keep private. // TODO some of the above could also be private. // to be able to use the same code to access the coefficients, no matter // if the operation is vectorized or not, we provide 'load' functions // which encapsulate the memory access. This allows us to uniformly handle // vectorized and unvectorized data: the remainder of the code processes // unvectorized and vectorized data alike, and only when it comes to // fetching the coefficients from memory we need specialized code for // the memory access. // KFJ 2018-03-15 changed the load functions to expect a pointer to // cf_ele_type, access memory via this pointer, then convert to some // given target type, rather than load to some type derived from // cf_ele_type. We know the coefficients are always cf_ele_type, but // we usually want math_type for processing. And we don't want the // calling code having to be 'aware' of what cf_ele_type is at all. // With this change, the template argument introducing the coefficient // type could go from the eval code, and ATD via the arguments works. // Note that since inner_evaluator uniformly handles data as TinyVectors, // 'target' in the load functions below is always a TinyVector, possibly // with only one element: we're using 'synthetic' types. /// load function for vigra::TinyVectors of fundamental T template < typename T , int N > static inline void load ( vigra::TinyVector < T , N > & target , const cf_ele_type * const mem , const int & index ) { for ( int i = 0 ; i < N ; i++ ) target[i] = T ( mem [ index + i ] ) ; } // KFJ 2018-05-08 with the automatic use of vectorization the // distinction whether cf_ele_type is 'vectorizable' or not // is no longer needed: simdized_type will be a Vc::SimdArray // if possible, a vspline::simd_tv otherwise. // dispatch, depending on whether cf_ele_type is the same as // what target contains. Usually the target will hold // 'math_ele_type', but for degree-0 splines, where the result // is directly derived from the coefficients, target holds // 'trg_ele_type'. We have the distinct cases first and the // dispatching routine below. template < typename target_t , typename index_t > static void load ( target_t & target , const cf_ele_type * const mem , const index_t & indexes , std::true_type ) { static const size_t sz = index_t::size() ; for ( int e = 0 ; e < target_t::static_size ; e++ ) { // directly gather to 'target' target[e].gather ( mem + e , indexes ) ; } } template < typename target_t , typename index_t > static void load ( target_t & target , const cf_ele_type * const mem , const index_t & indexes , std::false_type ) { static const size_t sz = index_t::size() ; vspline::simdized_type < cf_ele_type , sz > help ; for ( int e = 0 ; e < target_t::static_size ; e++ ) { // gather to 'help' and 'assign' to target, which affects // the necessary type transformation help.gather ( mem + e , indexes ) ; vspline::assign ( target[e] , help ) ; } } /// dispatch function for vectorized loads. We look at one criterion: /// - is cf_ele_type the same type as what the target object contains? template < typename target_t , typename index_t > static void inline load ( target_t & target , const cf_ele_type * const mem , const index_t & indexes ) { typedef typename target_t::value_type::value_type target_ele_type ; load ( target , mem , indexes , std::integral_constant < bool , std::is_same < cf_ele_type , target_ele_type > :: value > () ) ; } /// _eval is the workhorse routine and implements the recursive arithmetic /// needed to evaluate the spline. First the weights for the current dimension /// are obtained from the weights object passed in. Once the weights are known, /// they are successively multiplied with the results of recursively calling /// _eval for the next lower dimension and the products are summed up to produce /// the result value. The scheme of using a recursive evaluation has several /// benefits: it needs no explicit intermediate storage of partial sums /// (uses the stack instead) and it makes the process dimension-agnostic in an /// elegant way. therefore, the code is also thread-safe. note that this routine /// is used for operation on braced splines, with the sequence of offsets to be /// visited fixed at the evaluator's construction. template < int level , class math1_type , class offset_type > struct _eval { inline void operator() ( const offset_type & locus , cf_pointer_iterator & cfp_iter , const vigra::MultiArrayView < 2 , math1_type > & weight , vigra::TinyVector < math1_type , channels > & sum ) const { const math1_type w ( weight ( 0 , level ) ) ; // recursively call _eval for the next lower level, receiving // the result in 'sum', and apply the first weight to it _eval < level - 1 , math1_type , offset_type >() ( locus , cfp_iter , weight , sum ) ; for ( int d = 0 ; d < channels ; d++ ) sum[d] *= w ; // to pick up the result of further recursive calls: vigra::TinyVector < math1_type , channels > subsum ; // now keep calling _eval for the next lower level, receiving // the result in 'subsum', and apply the corresponding weight. // Then add the weighted subsum to 'sum'. for ( int i = 1 ; i < weight.shape ( 0 ) ; i++ ) { const math1_type w ( weight ( i , level ) ) ; _eval < level - 1 , math1_type , offset_type >() ( locus , cfp_iter , weight , subsum ) ; for ( int d = 0 ; d < channels ; d++ ) subsum[d] *= w ; sum += subsum ; } } } ; /// at level 0 the recursion ends, now we finally apply the weights for axis 0 /// to the window of coefficients. Note how cfp_iter is passed in per reference. /// This looks wrong, but it's necessary: When, in the course of the recursion, /// the level 0 routine is called again, it needs to access the next bunch of /// spline_order coefficients. /// /// Just incrementing the reference saves us incrementing higher up. /// This is the point where we access the spline's coefficients. Since _eval works /// for vectorized and unvectorized data alike, this access is coded as a call to /// 'load' which provides a uniform syntax for the memory access. The implementation /// of the load routines used here is just above. /// /// The access to the coefficients is a bit difficult to spot: they are accessed /// via cfp_iter. cfp_iter iterates over an array of readily offsetted pointers. /// These pointers point to all elements in a window of coefficients centered at /// the coefficients' base address. By adding 'locus' to one of these pointers, /// the resulting pointer now points to the element of a specific window of /// coefficients, namely that window where the coefficient subset for the current /// evaluation is located. locus may be a SIMD type, in which case it refers to /// several windows. 'locus' is the offset produced from the integral part of the /// coordinate(s) passed in, so it is the datum which provides the localization /// of the DDA, while the pointers coming from cfp_iter are constant throughout /// the evaluator's lifetime. template < class math1_type , class offset_type > struct _eval < 0 , math1_type , offset_type > { inline void operator() ( const offset_type & locus , cf_pointer_iterator & cfp_iter , const vigra::MultiArrayView < 2 , math1_type > & weight , vigra::TinyVector < math1_type , channels > & sum ) const { typedef vigra::TinyVector < math1_type , channels > math_type ; const math1_type w ( weight ( 0 , 0 ) ) ; // initialize 'sum' by 'loading' a coefficient (or a set of // coefficients if we're running vector code) - then apply // the first (set of) weight(s). load ( sum , *cfp_iter , locus ) ; for ( int d = 0 ; d < channels ; d++ ) sum[d] *= w ; ++cfp_iter ; // now keep on loading coefficients, apply corresponding weights // and add the weighted coefficients to 'sum' for ( int i = 1 ; i < weight.shape ( 0 ) ; i++ ) { const math1_type w ( weight ( i , 0 ) ) ; math_type help ; load ( help , *cfp_iter , locus ) ; for ( int d = 0 ; d < channels ; d++ ) help[d] *= w ; sum += help ; ++cfp_iter ; } } } ; /// specialized code for degree-1 b-splines, aka linear interpolation. /// here, there is no gain to be had from working with precomputed /// per-axis weights, the weight generation is trivial. So the /// specialization given here is faster than using the general _eval /// sequence, which otherwise works just as well for degree-1 splines. template < int level , class math1_type , class offset_type > struct _eval_linear { inline void operator() ( const offset_type& locus , cf_pointer_iterator & cfp_iter , const vigra::TinyVector < math1_type , dimension > & tune , vigra::TinyVector < math1_type , channels > & sum ) const { const math1_type wl ( math1_type(1) - tune [ level ] ) ; const math1_type wr ( tune [ level ] ) ; _eval_linear < level - 1 , math1_type , offset_type >() ( locus , cfp_iter , tune , sum ) ; for ( int d = 0 ; d < channels ; d++ ) sum[d] *= wl ; vigra::TinyVector < math1_type , channels > subsum ; _eval_linear < level - 1 , math1_type , offset_type >() ( locus , cfp_iter , tune , subsum ) ; for ( int d = 0 ; d < channels ; d++ ) subsum[d] *= wr ; sum += subsum ; } } ; /// again, level 0 terminates the recursion, again accessing the spline's /// coefficients with the 'load' function defined above. template < class math1_type , class offset_type > struct _eval_linear < 0 , math1_type , offset_type > { inline void operator() ( const offset_type & locus , cf_pointer_iterator & cfp_iter , const vigra::TinyVector < math1_type , dimension > & tune , vigra::TinyVector < math1_type , channels > & sum ) const { const math1_type wl ( math1_type(1) - tune [ 0 ] ) ; const math1_type wr ( tune [ 0 ] ) ; load ( sum , *cfp_iter , locus ) ; ++cfp_iter ; for ( int d = 0 ; d < channels ; d++ ) sum[d] *= wl ; vigra::TinyVector < math1_type , channels > help ; load ( help , *cfp_iter , locus ) ; ++cfp_iter ; for ( int d = 0 ; d < channels ; d++ ) help[d] *= wr ; sum += help ; } } ; public: // next we have the code which is called from 'outside'. In this section, // incoming coordinates are split into their integral and remainder part. // The remainder part is used to obtain the weights to apply to the spline // coefficients. The resulting data are then fed to the workhorse code above. // We have several specializations here depending on the degree of the spline. /// the 'innermost' eval routine is called with offset(s) and weights. This /// routine is public because it is used from outside (namely by grid_eval). /// In this final delegate we call the workhorse code in class _eval // TODO: what if the spline is degree 0 or 1? for these cases, grid_eval // should not pick this general-purpose routine template < class result_type , class math1_type , class offset_type > inline void eval ( const offset_type& select , const vigra::MultiArrayView < 2 , math1_type > & weight , result_type & result ) const { // we need an *instance* of this iterator because it's passed into _eval // by reference and manipulated by the code there: cf_pointer_iterator cfp_iter = cf_pointers.begin() ; // now we can call the recursive _eval routine yielding the result // as math_type. typedef vigra::TinyVector < math1_type , channels > math_type ; math_type _result ; _eval < level , math1_type , offset_type >() ( select , cfp_iter , weight , _result ) ; // finally, we assign to result, casting to 'result_type'. If _result // and result are of the same type, the compiler will optimize the // intermediate _result away. vspline::assign ( result , _result ) ; } private: /// 'penultimate' eval starts from an offset to a coefficient window; here /// the nD integral index to the coefficient window has already been /// 'condensed' into a 1D offset into the coefficient array's memory. /// Here we have the specializations affected by the template argument 'specialize' /// which activates more efficient code for degree 0 (nearest neighbour) and degree 1 /// (linear interpolation) splines. I draw the line here; one might add further /// specializations, but from degree 2 onwards the weights are reused several times /// so looking them up in a small table (as the general-purpose code for unspecialized /// operation does) should be more efficient (TODO test). /// /// we have three variants, depending on 'specialize'. first is the specialization /// for nearest-neighbour interpolation, which doesn't delegate further, since the /// result can be obtained directly by gathering from the coefficients: /// /// dispatch for nearest-neighbour interpolation (degree 0) /// this is trivial: we pick the coefficient(s) at position 'select' and directly /// convert to result_type, since the coefficient 'window' in this case has width 1 /// in every direction, the 'weight' to apply is 1. The general-purpose code below /// would iterate over this width-1 window and apply the weight, which makes it /// slower since both is futile here. template < class result_type , class math1_type , class offset_type > inline void eval ( const offset_type& select , const vigra::TinyVector < math1_type , dimension > & tune , result_type & result , std::integral_constant < int , 0 > ) const { load ( result , cf_ebase , select ) ; } /// eval dispatch for linear interpolation (degree 1) /// again, we might use the general-purpose code below for this situation, /// but since the weights for linear interpolation are trivially computable, /// being 'tune' and 1 - 'tune', we use specialized workhorse code in /// _eval_linear, see there. template < class result_type , class math1_type , class offset_type > inline void eval ( const offset_type & select , const vigra::TinyVector < math1_type , dimension > & tune , result_type & result , std::integral_constant < int , 1 > ) const { cf_pointer_iterator cfp_iter = cf_pointers.begin() ; typedef vigra::TinyVector < math1_type , channels > math_type ; math_type _result ; _eval_linear < level , math1_type , offset_type >() ( select , cfp_iter , tune , _result ) ; vspline::assign ( result , _result ) ; } /// eval dispatch for arbitrary spline degrees /// here we have the general-purpose routine which works for arbitrary /// spline degrees (as long as the code in basis.h can provide, which is /// currently up to degree 45). Here, the weights are calculated by accessing /// the b-spline basis function. With the weights at hand, we delegate to /// an overload of 'eval' accepting weights, see there. template < class result_type , class math1_type , class offset_type , int arbitrary_spline_degree > inline void eval ( const offset_type& select , const vigra::TinyVector < math1_type , dimension > & tune , result_type & result , std::integral_constant < int , arbitrary_spline_degree > ) const { using allocator_t = typename vspline::allocator_traits < math1_type > :: type ; // 'weight' is a 2D vigra::MultiArray of math1_type: // TODO: can this instantiation be tweaked in any way, like // by making sure the memory is neither dynamically (re)allocated // nor zero-initialized? This is inner-loop code after all... vigra::MultiArray < 2 , math1_type , allocator_t > weight ( vigra::Shape2 ( spline_order , dimension ) ) ; // obtain_weights fills the array, that's why previous initialization // with zero is futile: obtain_weights ( weight , tune ) ; eval ( select , weight , result ) ; } public: /// while class evaluator accepts the argument signature of a /// vspline::unary_functor, class inner_evaluator uses 'synthetic' /// types, which are always TinyVectors - possibly of just one element. /// This simplifies the code, since the 'singular' arguments don't have /// to be treated separately. The data are just the same in memory and /// class evaluator simply reinterpret_casts the arguments it receives /// to the 'synthetic' types. Another effect of moving to the 'synthetic' /// types is to 'erase' their type: any 'meaning' they may have, like /// std::complex etc., is removed - they are treated as 'bunches' of /// a fundamental type or a vector. The synthetic types are built using /// a combination of two template template arguments: 'bunch' and 'vector': /// - 'bunch', which forms an aggregate of several of a given /// type, like a vigra::TinyVector, which is currently the /// only template used for the purpose. /// - 'vector', which represents an aggregate of several fundamentals /// of equal type which will be processed with SIMD logic. Currently, /// the templates used for the purpose are vspline::simd_tv /// (simulating SIMD operations with ordinary scalar code), /// Vc::SimdArray, which is a 'proper' SIMD type, and /// vspline::scalar, which is used for unvectorized data. /// Note how 'vsize' is a template argument to this function, and /// not a template argument to class inner_evaluator. This is more /// flexible, the calling code can process any vsize with the same /// inner_evaluator. template < template < typename , int > class bunch , template < typename , size_t > class vector , size_t vsize > inline void eval ( const bunch < vector < rc_ele_type , vsize > , dimension > & coordinate , bunch < vector < trg_ele_type , vsize > , channels > & result ) const { // derive the 'vectorized' types, depending on 'vector' and the // elementary types typedef vector < ic_ele_type , vsize > ic_ele_v ; typedef vector < rc_ele_type , vsize > rc_ele_v ; typedef vector < math_ele_type , vsize > math_ele_v ; typedef vector < ofs_ele_type , vsize > ofs_ele_v ; typedef vector < trg_ele_type , vsize > trg_ele_v ; // perform the coordinate split bunch < ic_ele_v , dimension > select ; bunch < rc_ele_v , dimension > _tune ; split ( coordinate , select , _tune ) ; // convert the remainders to math_type bunch < math_ele_v , dimension > tune ; vspline::assign ( tune , _tune ) ; // 'condense' the discrete nD coordinates into offsets ofs_ele_v origin = select[0] * ic_ele_type ( cf_estride [ 0 ] ) ; for ( int d = 1 ; d < dimension ; d++ ) origin += select[d] * ic_ele_type ( cf_estride [ d ] ) ; // delegate, dispatching on 'specialize' eval ( origin , tune , result , std::integral_constant < int , specialize > () ) ; } } ; // end of inner_evaluator } ; // namespace detail /// class evaluator encodes evaluation of a B-spline. Technically, a vspline::evaluator /// is a vspline::unary_functor, which has the specific capability of evaluating a /// specific uniform b-spline. This makes it a candidate to be passed to the functions /// in transform.h, like remap() and transform(), and it also makes it suitable for /// vspline's functional constructs like chaining, mapping, etc. /// /// While creating and using vspline::evaluators is simple enough, especially from /// vspline::bspline objects, there are also factory functions producing objects capable /// of evaluating a b-spline. These objects are wrappers around a vspline::evaluator, /// please see the factory functions make_evaluator() and make_safe_evaluator() at the /// end of this file. /// /// If you don't want to concern yourself with the details, the easiest way is to /// have a bspline object handy and use one of the factory functions, assigning the /// resulting functor to an auto variable: /// /// // given a vspline::bspline object 'bspl' /// // create an object (a functor) which can evaluate the spline /// auto ev = vspline::make_safe_evaluator ( bspl ) ; /// // which can be used like this: /// auto value = ev ( real_coordinate ) ; /// /// The evaluation relies on 'braced' coefficients, as they are provided by /// a vspline::bspline object. While the most general constructor will accept /// a raw pointer to coefficients (enclosed in the necessary 'brace'), this will rarely /// be used, and an evaluator will be constructed from a bspline object. To create an /// evaluator directly, the specific type of evaluator has to be established by providing /// the relevant template arguments. We need at least two types: the 'coordinate type' /// and the 'value type': /// /// - The coordinate type is encoded as a vigra::TinyVector of some real data type - /// doing image processing, the typical type would be a vigra::TinyVector < float , 2 >. /// fundamental real types are also accepted (for 1D splines) /// /// - The value type will usually be either a fundamental real data type such as 'float', /// or a vigra::TinyVector of such an elementary type. Other data types which can be /// handled by vigra's ExpandElementResult mechanism should also work. When processing /// colour images, your value type would typically be a vigra::TinyVector < float , 3 >. /// You can also process integer-valued data, in which case you may suffer from /// quantization errors, so you should make sure your data cover the dynamic range of /// the integer type used as best as possible (like by using the 'boost' parameter when /// prefiltering the b-spline). Processing of integer-valued data is done using floating /// point arithmetics internally, so the quantization error occurs if the result is /// ready and assigned to an integer target type; if the target data type is real, the /// result is precise (within arithmetic precision). /// /// you can choose the data type which is used internally to do computations. per default, /// this will be a real type 'appropriate' to the operation, but you're free to pick some /// other type. Note that picking integer types for the purpose is *not* allowed. /// The relevant template argument is 'math_ele_type'. /// /// Note that class evaluator operates with 'native' spline coordinates, which run with /// the coefficient array's core shape, so typically from 0 to M-1 for a 1D spline over /// M values. Access with different coordinates is most easily done by 'chaining' class /// evaluator objects to other vspline::unary_functor objects providing coordinate /// translation, see unary_functor.h, map.h and domain.h. /// /// The 'native' coordinates can be thought of as an extension of the discrete coordinates /// used to index the spline's knot points. Let's assume you have a 1D spline over knot /// points in an array a. While you can index a with discrete coordinates like 0, 1, 2... /// you can evaluate the spline at real coordinates like 0.0, 1.5, 7.8. If a real coordinate /// has no fractional part, evaluation of the spline at this coordinate will produce the /// knot point value at the index which is equal to the real coordinate, so the interpolation /// criterion is fulfilled. /// /// While the template arguments specify coordinate and value type for unvectorized /// operation, the types for vectorized operation are inferred from them, using vspline's /// vector_traits mechanism. The width of SIMD vectors to be used can be chosen explicitly. /// This is not mandatory - if omitted, a default value is picked. /// /// With the evaluator's type established, an evaluator of this type can be constructed by /// passing a vspline::bspline object to it's constructor. Usually, the bspline object will /// contain data of the same value type, and the spline has to have the same number of /// dimensions as the coordinate type. Alternatively, coefficients can be passed in as a /// pointer into a field of suitably braced coefficients. It's okay for the spline to hold /// coefficients of a different type: they will be cast to math_type during the evaluation. /// /// I have already hinted at the evaluation process used, but here it is again in a nutshell: /// The coordinate at which the spline is to be evaluated is split into it's integral part /// and a remaining fraction. The integral part defines the location where a window from the /// coefficient array is taken, and the fractional part defines the weights to use in calculating /// a weighted sum over this window. This weighted sum represents the result of the evaluation. /// Coordinate splitting is done with the method split(), which picks the appropriate variant /// (different code is used for odd and even splines) /// /// The generation of the weights to be applied to the window of coefficients is performed /// by employing weight functors from basis.h. What's left to do is to bring all the components /// together, which happens in class inner_evaluator. The workhorse code in the subclass _eval /// takes care of performing the necessary operations recursively over the dimensions of the /// spline. /// /// a vspline::evaluator is technically a vspline::unary_functor. This way, it can be directly /// used by constructs like vspline::chain and has a consistent interface which allows code /// using evaluators to query it's specifics. Since evaluation uses no conditionals on the /// data path, the whole process can be formulated as a set of templated member functions /// using vectorized types or unvectorized types, so the code itself is vector-agnostic. /// This makes for a nicely compact body of code inside class inner_evaluator, at the cost of /// having to provide a bit of collateral code to make data access syntactically uniform, /// which is done with inner_evaluator's 'load' method. /// /// The evaluation strategy is to have all dependencies of the evaluation except for the actual /// coordinates taken care of by the constructor - and immutable for the evaluator's lifetime. /// The resulting object has no state which is modified after construction, making it thread-safe. /// It also constitutes a 'pure' functor in a functional-programming sense, because it has /// no mutable state and no side-effects, as can be seen by the fact that the 'eval' methods /// are all marked const. /// /// By providing the evaluation in this way, it becomes easy for calling code to integrate /// the evaluation into more complex functors. Consider, for example, code which generates /// coordinates with a functor, then evaluates a b-spline at these coordinates, /// and finally subjects the resultant values to some postprocessing. All these processing /// steps can be bound into a single functor, and the calling code can be reduced to polling /// this functor until it has provided the desired number of output values. /// /// While the 'unspecialized' evaluator will try and do 'the right thing' by using general /// purpose code fit for all eventualities, for time-critical operation there are /// specializations which can be used to make the code faster: /// /// - template argument 'specialize' can be set to 0 to forcibly use (more efficient) nearest /// neighbour interpolation, which has the same effect as simply running with degree 0 but avoids /// code which isn't needed for nearest neighbour interpolation (like the application of weights, /// which is futile under the circumstances, the weight always being 1.0). /// specialize can also be set to 1 for explicit n-linear interpolation. Any other value will /// result in the general-purpose code being used. /// /// Note how the default number of vector elements is fixed by picking the value /// which vspline::vector_traits considers appropriate. There should rarely be a need to /// choose a different number of vector elements: evaluation will often be the most /// computationally intensive part of a processing chain, and therefore this choice is /// sensible. But it's not mandatory. Just keep in mind that when building processing /// pipelines, all their elements must use the *same* vectorization width. // evaluator 'wraps' detail::inner_evaluator. As a class derived from // vspline::unary_functor, it can be used by vspline's high-level code // (like transform, remap, apply). Internally, it delegates to // inner_evaluator, which operates on 'synthetic' types, so that // any peculiarities of the processed data are removed and there is // only a stringent uniform type system to deal with. // we have recent additions to the template argument list here: math_ele_type // can be specified to pick some type apart from 'default_math_type', which // should be the appropriate type in most cases. Previously, this type was // not separate but always the same as the elementary type of _cf_type. // This way, the code becomes more flexible - the notable new feature is // usable processing of integral coefficient arrays. _cf_type is also new // and allows to specify a coefficient type different from the target type. // I introduced this as a separate entity since the arithmetics are // done in 'math_type', and so is the 'internal' result. If the coefficients // are, say, int, the 'internal' result might be float, and calling code may // want the float result rather than having it cast to int. In such cases, // the a coefficient type different from the 'target type' can be specified. /// class evaluator takes several template arguments, where the first two /// are mandatory, while the remainder have defaults: /// /// - _coordinate_type: type of a real coordinate, where the spline is to be /// evaluated. This can be either a fundamental type like float or double, /// or a vigra::TinyVector of as many elements as the spline has dimensions. /// /// - _trg_type: this is the data type the evaluation will produce as it's /// result. While internally all arithmetic is done in 'math_type', the /// internal result is cast to _trg_type when it's ready. _trg_type may be /// a fundamental type or any type known to vigra's ExpandElementResult /// mechanism, like vigra::TinyVector. It has to have as many channels as /// the coefficients of the spline (or the knot point data). /// /// - _vsize: width of SIMD vectors to use for vectorized operation. /// While class inner_evaluator takes this datum as a template argument to /// it's eval routine, here it's a template argument to the evaluator class. /// This is so because class evaluator inherits from vspline::unary_functor, /// which also requires a specific vector size, because otherwise type /// erasure using std::functions would not be possible. While you may /// choose arbitrary _vsize, only small multiples of the hardware vector /// width of the target machine will produce most efficient code. Passing /// 1 here will result in unvectorized code. /// /// - _math_ele_type: elementary type to use for arithemtics in class /// inner_evaluator. While in most cases default_math_type will be just right, /// the default may be overridden. _math_ele_type must be a real data type. /// /// - cf_type: data type of the coefficients of the spline. Normally this /// will be the same as _trg_type above, but this is not mandatory. The /// coefficients will be converted to math_type once they have been loaded /// from memory, and all arithmetic is done in math_type. template < typename _coordinate_type , typename _trg_type , size_t _vsize = vspline::vector_traits < _trg_type > :: size , int _specialize = -1 , typename _math_ele_type = default_math_type < _coordinate_type , _trg_type > , typename _cf_type = _trg_type > class evaluator : public unary_functor < _coordinate_type , _trg_type , _vsize > , public vspline::callable < evaluator < _coordinate_type , _trg_type , _vsize , _specialize , _math_ele_type , _cf_type > , _coordinate_type , _trg_type , _vsize > { public: // pull in the template arguments typedef _coordinate_type coordinate_type ; typedef _cf_type cf_type ; typedef _math_ele_type math_ele_type ; typedef _trg_type trg_type ; // we figure out the elementary types and some enums which we'll // use to specify the type of 'inner_evaluator' we'll use. This is // the 'analytic' part of dealing with the types, inner_evaluator // does the 'synthetic' part. typedef int ic_ele_type ; typedef int ofs_ele_type ; typedef ET < coordinate_type > rc_ele_type ; typedef ET < cf_type > cf_ele_type ; typedef ET < trg_type > trg_ele_type ; enum { vsize = _vsize } ; // we want to access facilities of the base class (vspline::unary_functor) // so we use a typedef for the base class. class evaluator's property of // being derived from vspline::unary_functor provides it's 'face' to // calling code, while it's inner_evaluator provides the implementation of // it's capabilities. typedef unary_functor < coordinate_type , trg_type , vsize > base_type ; enum { dimension = base_type::dim_in } ; enum { level = dimension - 1 } ; enum { channels = base_type::dim_out } ; enum { specialize = _specialize } ; // now we can define the type of the 'inner' evaluator. // we pass all elementary types we intend to use, plus the number of // dimensions and channels, and the 'specialize' parameter which activates // specialized code for degree-0 and -1 splines. typedef detail::inner_evaluator < ic_ele_type , rc_ele_type , ofs_ele_type , cf_ele_type , math_ele_type , trg_ele_type , dimension , channels , specialize > inner_type ; // class evaluator has an object of this type as it's sole member: const inner_type inner ; private: /// feeder function. This is private, since it performs potentially /// dangerous reinterpret_casts which aren't meant for 'the public', /// but only for use by the 'eval' methods below, which provide the /// interface expected of a vspline::unary_functor. /// The cast reinterprets the arguments as the corresponding /// 'synthetic' types, using the templates 'bunch' and 'vector'. /// The reinterpreted data are fed to 'inner'. template < template < typename , int > class bunch , template < typename , size_t > class vector , size_t VSZ , typename in_type , typename out_type > inline void feed ( const in_type & _coordinate , out_type & _result ) const { typedef bunch < vector < rc_ele_type , VSZ > , dimension > rc_t ; typedef bunch < vector < trg_ele_type , VSZ > , channels > trg_t ; auto const & coordinate = reinterpret_cast < rc_t const & > ( _coordinate ) ; auto & result = reinterpret_cast < trg_t & > ( _result ) ; inner.template eval < bunch , vector , VSZ > ( coordinate , result ) ; } public: /// unvectorized evaluation function. this is delegated to 'feed' /// above, which reinterprets the arguments as the 'synthetic' types /// used by class inner_evaluator. inline void eval ( const typename base_type::in_type & _coordinate , typename base_type::out_type & _result ) const { feed < vigra::TinyVector , vspline::scalar , 1 > ( _coordinate , _result ) ; } /// vectorized evaluation function. This is enabled only if vsize > 1 /// to guard against cases where vsize is 1. Without the enable_if, we'd /// end up with two overloads with the same signature if vsize is 1. /// Again we delegate to 'feed' to reinterpret the arguments, this time /// passing vspline::simdized_type for 'vector'. template < typename = std::enable_if < (vsize>1) > > inline void eval ( const typename base_type::in_v & _coordinate , typename base_type::out_v & _result ) const { feed < vigra::TinyVector , vspline::simdized_type , vsize > ( _coordinate , _result ) ; } typedef vigra::TinyVector < std::ptrdiff_t , dimension > shape_type ; typedef vigra::TinyVector < int , dimension > derivative_spec_type ; /// class evaluator's constructors are used to initialize 'inner'. /// This first constructor overload will rarely be used by calling /// code; the commonly used overload is the next one down taking a /// vspline::bspline object. evaluator ( const cf_ele_type * const cf_ebase , const shape_type & cf_estride , int spline_degree , derivative_spec_type derivative = derivative_spec_type ( 0 ) ) : inner ( cf_ebase , cf_estride , spline_degree , derivative ) { } ; /// constructor taking a vspline::bspline object, and, optionally, /// a specification for derivatives of the spline and 'shift'. evaluator ( const vspline::bspline < cf_type , dimension > & bspl , derivative_spec_type derivative = derivative_spec_type ( 0 ) , int shift = 0 ) : evaluator ( (cf_ele_type*) ( bspl.core.data() ) , channels * bspl.core.stride() , bspl.spline_degree + shift , derivative ) { // while the general constructor above has already been called, // we haven't yet made certain that a requested shift has resulted // in a valid evaluator. We check this now and throw an exception // if the shift was illegal. if ( ! bspl.shiftable ( shift ) ) throw not_supported ( "insufficient frame size. the requested shift can not be performed." ) ; } ; } ; // end of class evaluator // in the next section we have the collateral code needed to implement // the factory functions make_evaluator() and make_safe_evaluator(). // This code uses class evaluator as a vspline::unary_functor, so the // objects which are produced by the factory functions can only handle // a fixed vsize, in contrast to class inner_evaluator, which can // process 'synthetic' arguments with a wider spectrum. namespace detail { /// helper object to create a type-erased vspline::evaluator for /// a given bspline object. The evaluator is specialized to the /// spline's degree, so that degree-0 splines are evaluated with /// nearest neighbour interpolation, degree-1 splines with linear /// interpolation, and all other splines with general b-spline /// evaluation. The resulting vspline::evaluator is 'grokked' to /// erase it's type to make it easier to handle on the receiving /// side: build_ev will always return a vspline::grok_type, not /// one of the several possible evaluators which it produces /// initially. Why the type erasure? Because a function can only /// return one distinct type. With specialization for degree-0, /// degre-1 and arbitrary spline degrees, there are three distinct /// types of evaluator to take care of. If they are to be returned /// as a common type, type erasure is the only way. template < typename spline_type , typename rc_type , size_t _vsize , typename math_ele_type , typename result_type > struct build_ev { vspline::grok_type < bspl_coordinate_type < spline_type , rc_type > , result_type , _vsize > operator() ( const spline_type & bspl , vigra::TinyVector dspec = vigra::TinyVector ( 0 ) , int shift = 0 ) { typedef bspl_coordinate_type < spline_type , rc_type > crd_t ; typedef bspl_value_type < spline_type > value_type ; if ( bspl.spline_degree == 0 ) return vspline::grok ( vspline::evaluator < crd_t , result_type , _vsize , 0 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; else if ( bspl.spline_degree == 1 ) return vspline::grok ( vspline::evaluator < crd_t , result_type , _vsize , 1 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; else return vspline::grok ( vspline::evaluator < crd_t , result_type , _vsize , -1 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; } } ; /// helper object to create a vspline::mapper object with gate types /// matching a bspline's boundary conditions and extents matching the /// spline's lower and upper limits. Please note that these limits /// depend on the boundary conditions and are not always simply /// 0 and N-1, as they are for, say, mirror boundary conditions. /// see lower_limit() and upper_limit() in vspline::bspline. /// /// gate types are inferred from boundary conditions like this: /// /// PERIODIC -> periodic_gate /// MIRROR, REFLECT -> mirror_gate /// all other boundary conditions -> clamp_gate /// /// The mapper object is chained to an evaluator, resulting in /// a functor providing safe access to the evaluator. The functor /// is subsequently 'grokked' to produce a uniform return type. /// /// Please note that this is only one possible way of dealing with /// out-of-bounds coordinates: they are mapped into the defined range /// in a way that is coherent with the boundary conditions. If you /// need other methods you'll have to build your own functional /// construct. /// /// While build_ev (above) had three distinct types to deal with, /// here, the number of potential types is even larger: every distinct /// boundary condition along every distinct axis will result in a specfic /// type of 'gate' object. So again we use type erasure to provide a /// common return type, namely vspline::grok_type. template < int level , typename spline_type , typename rc_type , size_t _vsize , typename math_ele_type , typename result_type , class ... gate_types > struct build_safe_ev { vspline::grok_type < bspl_coordinate_type < spline_type , rc_type > , result_type , _vsize > operator() ( const spline_type & bspl , gate_types ... gates , vigra::TinyVector dspec = vigra::TinyVector ( 0 ) , int shift = 0 ) { // find out the spline's lower and upper limit for the current level rc_type lower ( bspl.lower_limit ( level ) ) ; rc_type upper ( bspl.upper_limit ( level ) ) ; // depending on the spline's boundary condition for the current // level, construct an appropriate gate object and recurse to // the next level. If the core's shape along this axis (level) // is 1, always clamp to zero. Note how for BC NATURAL the coordinate // is also clamped, because we can't produce the point-mirrored // continuation of the signal with a coordinate manipulation. auto bc = bspl.bcv [ level ] ; if ( bspl.core.shape ( level ) == 1 ) { bc = vspline::CONSTANT ; lower = upper = rc_type ( 0 ) ; } switch ( bc ) { case vspline::PERIODIC: { auto gt = vspline::periodic < rc_type , _vsize > ( lower , upper ) ; return build_safe_ev < level - 1 , spline_type , rc_type , _vsize , math_ele_type , result_type , decltype ( gt ) , gate_types ... >() ( bspl , gt , gates ... , dspec , shift ) ; break ; } case vspline::MIRROR: case vspline::REFLECT: { auto gt = vspline::mirror < rc_type , _vsize > ( lower , upper ) ; return build_safe_ev < level - 1 , spline_type , rc_type , _vsize , math_ele_type , result_type , decltype ( gt ) , gate_types ... >() ( bspl , gt , gates ... , dspec , shift ) ; break ; } default: { auto gt = vspline::clamp < rc_type , _vsize > ( lower , upper , lower , upper ) ; return build_safe_ev < level - 1 , spline_type , rc_type , _vsize , math_ele_type , result_type , decltype ( gt ) , gate_types ... >() ( bspl , gt , gates ... , dspec , shift ) ; break ; } } } } ; /// at level -1, there are no more axes to deal with, here the recursion /// ends and the actual mapper object is created. Specializing on the /// spline's degree (0, 1, or indeterminate), an evaluator is created /// and chained to the mapper object. The resulting functor is grokked /// to produce a uniform return type, which is returned to the caller. template < typename spline_type , typename rc_type , size_t _vsize , typename math_ele_type , typename result_type , class ... gate_types > struct build_safe_ev < -1 , spline_type , rc_type , _vsize , math_ele_type , result_type , gate_types ... > { vspline::grok_type < bspl_coordinate_type < spline_type , rc_type > , result_type , _vsize > operator() ( const spline_type & bspl , gate_types ... gates , vigra::TinyVector dspec = vigra::TinyVector ( 0 ) , int shift = 0 ) { typedef bspl_coordinate_type < spline_type , rc_type > crd_t ; typedef bspl_value_type < spline_type > value_type ; if ( bspl.spline_degree == 0 ) return vspline::grok ( vspline::mapper < crd_t , _vsize , gate_types... > ( gates ... ) + vspline::evaluator < crd_t , result_type , _vsize , 0 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; else if ( bspl.spline_degree == 1 ) return vspline::grok ( vspline::mapper < crd_t , _vsize , gate_types... > ( gates ... ) + vspline::evaluator < crd_t , result_type , _vsize , 1 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; else return vspline::grok ( vspline::mapper < crd_t , _vsize , gate_types... > ( gates ... ) + vspline::evaluator < crd_t , result_type , _vsize , -1 , math_ele_type , value_type > ( bspl , dspec , shift ) ) ; } } ; } ; // namespace detail /// make_evaluator is a factory function, producing a functor /// which provides access to an evaluator object. Evaluation /// using the resulting object is *not* intrinsically safe, /// it's the user's responsibility not to pass coordinates /// which are outside the spline's defined range. If you need /// safe access, see 'make_safe_evaluator' below. 'Not safe' /// in this context means that evaluation at out-of-bounds /// locations may result in a memory fault or produce wrong or /// undefined results. Note that vspline's bspline objects /// are set up as to allow evaluation at the lower and upper /// limit of the spline and will also tolerate values 'just /// outside' the bounds to guard against quantization errors. /// see vspline::bspline for details. /// /// The evaluator will be specialized to the spline's degree: /// degree 0 splines will be evaluated with nearest neighbour, /// degree 1 splines with linear interpolation, all other splines /// will use general b-spline evaluation. /// /// This function returns the evaluator wrapped in an object which /// hides it's type. This object only 'knows' what coordinates it /// can take and what values it will produce. The extra level of /// indirection costs a bit of performance, but having a common type /// simplifies handling. The wrapped evaluator also provides operator(). /// /// So, if you have a vspline::bspline object 'bspl', you can use this /// factory function like this: /// /// auto ev = make_evaluator ( bspl ) ; /// typedef typename decltype(ev)::in_type coordinate_type ; /// coordinate_type c ; /// auto result = ev ( c ) ; /// /// make_evaluator requires one template argument: spline_type, the /// type of the vspline::bspline object you want to have evaluated. /// Optionally, you can specify the elementary type for coordinates /// (use either float or double) and the vectorization width. The /// latter will only have an effect if vectorization is used and /// the spline's data type can be vectorized. Per default, the /// vectorization width will be inferred from the spline's value_type /// by querying vspline::vector_traits, which tries to provide a /// 'good' choice. Note that a specific evaluator will only be capable /// of processing vectorized coordinates of precisely the _vsize it /// has been created with. A recent addition to the template argument /// list is 'math_ele_type', which allows you to pick a different type /// for internal processing than the default. The default is a real /// type 'appropriate' to the data in the spline. /// /// Note that the object created by this factory function will /// only work properly if you evaluate coordinates of the specific /// 'rc_type' used. If you create it with the default rc_type, which /// is float (and usually sufficiently precise for a coordinate), you /// can't evaluate double precision coordinates with it. /// /// On top of the bspline object, you can optionally pass a derivative /// specification and a shift value, which are simply passed through /// to the evaluator's constructor, see there for the meaning of these /// optional parameters. /// /// While the declaration of this function looks frightfully complex, /// using it is simple: in most cases it's simply /// /// auto ev = make_evaluator ( bspl ) ; /// /// For an explanation of the template arguments, please see /// make_safe_evaluator() below, which takes the same template args. template < class spline_type , typename rc_type = float , size_t _vsize = vspline::vector_traits < typename spline_type::value_type > :: size , typename math_ele_type = default_math_type < typename spline_type::value_type , rc_type > , typename result_type = typename spline_type::value_type > vspline::grok_type < bspl_coordinate_type < spline_type , rc_type > , result_type , _vsize > make_evaluator ( const spline_type & bspl , vigra::TinyVector dspec = vigra::TinyVector ( 0 ) , int shift = 0 ) { typedef typename spline_type::value_type value_type ; typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ; enum { vsize = _vsize } ; return detail::build_ev < spline_type , rc_type , _vsize , math_ele_type , result_type >() ( bspl , dspec , shift ) ; } /// make_safe_evaluator is a factory function, producing a functor /// which provides safe access to an evaluator object. This functor /// will map incoming coordinates into the spline's defined range, /// as given by the spline with it's lower_limit and upper_limit /// methods, honoring the bspline objects's boundary conditions. /// So if, for example, the spline is periodic, all incoming /// coordinates are valid and will be mapped to the first period. /// Note the use of lower_limit and upper_limit. These values /// also depend on the spline's boundary conditions, please see /// class vspline::bspline for details. If there is no way to /// meaningfully fold a coordinate into the defined range, the /// coordinate is clamped to the nearest limit. /// /// The evaluator will be specialized to the spline's degree: /// degree 0 splines will be evaluated with nearest neighbour, /// degree 1 splines with linear interpolation, all other splines /// will use general b-spline evaluation. /// /// This function returns the functor wrapped in an object which /// hides it's type. This object only 'knows' what coordinates it /// can take and what values it will produce. The extra level of /// indirection costs a bit of performance, but having a common type /// simplifies handling: the type returned by this function only /// depends on the spline's data type, the coordinate type and /// the vectorization width. /// /// Also note that the object created by this factory function will /// only work properly if you evaluate coordinates of the specific /// 'rc_type' used. If you create it with the default rc_type, which /// is float (and usually sufficiently precise for a coordinate), you /// can't evaluate double precision coordinates with it. /// /// On top of the bspline object, you can optionally pass a derivative /// specification and a shift value, which are simply passed through /// to the evlauator's constructor, see there for the meaning of these /// optional parameters. /// /// While the declaration of this function looks frightfully complex, /// using it is simple: in most cases it's simply /// /// auto ev = make_safe_evaluator ( bspl ) ; /// /// The first template argument, spline_type, is the type of a /// vspline::bspline object. This template argument has no default, /// since it determines the dimensionality and the coefficient type. /// But since the first argument to this factory function is of /// this type, spline_type can be fixed via ATD, so it can be /// omitted. /// /// The second template argument, rc_type, can be used to pick a /// different elementary type for the coordinates the evaluator will /// accept. In most cases the default, float, will be sufficient. /// /// The next template argument, _vsize, fixes the vectorization width. /// Per default, this will be what vspline deems appropriate for the /// spline's coefficient type. /// /// math_ele_type can be used to specify a different fundamental type /// to be used for arithemtic operations during evaluation. The default /// used here is a real type of at least the precision of the coordinates /// or the spline's coefficients, but you may want to raise precision /// here, for example by passing 'double' while your data are all float. /// /// Finally you can specify a result type. Per default the result will /// be of the same type as the spline's coefficients, but you may want /// a different value here - a typical example would be a spline with /// integral coefficients, where you might prefer to get the result in, /// say, float to avoid quantization errors on the conversion from the /// 'internal' result (which is in math_type) to the output. template < class spline_type , typename rc_type = float , size_t _vsize = vspline::vector_traits < typename spline_type::value_type > :: size , typename math_ele_type = default_math_type < typename spline_type::value_type , rc_type > , typename result_type = typename spline_type::value_type > vspline::grok_type < bspl_coordinate_type < spline_type , rc_type > , result_type , _vsize > make_safe_evaluator ( const spline_type & bspl , vigra::TinyVector dspec = vigra::TinyVector ( 0 ) , int shift = 0 ) { typedef typename spline_type::value_type value_type ; typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ; enum { vsize = _vsize } ; return detail::build_safe_ev < spline_type::dimension - 1 , spline_type , rc_type , _vsize , math_ele_type , result_type >() ( bspl , dspec , shift ) ; } ; } ; // end of namespace vspline #endif // VSPLINE_EVAL_H kfj-vspline-4b365417c271/example/000077500000000000000000000000001333775006700163465ustar00rootroot00000000000000kfj-vspline-4b365417c271/example/anytype.cc000066400000000000000000000330071333775006700203510ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file anytype.cc /// /// \brief demonstrates the three modes of vectorization: /// scalar, emulated and with Vc /// /// compile: clang++ -std=c++11 -o anytype anytype.cc /// or: clang++ -std=c++11 -DUSE_VC -o anytype anytype.cc -lVc /// /// Do try out using g++ as well, and use various levels of optimization. /// When I did these tests, I found no conclusive results with the speed /// tests which would warrant a clear recommendation. But then, the maths /// used are so trivial that optimization may come up with really clever /// ways of speeding them up, and surely a lot of time is spent on the /// memory access. The main point of this program isn't the speed tests, /// though, but the demonstration of the vectorization modes. #include #include #include // for this demonstration, we don't want multithreading: // #define VSPLINE_SINGLETHREAD #include // we use a type 'test_t' which is 'monolithic', meaning it can't be // processed by vigra's ExpandElementResult mechanism: struct test_t { char padding[12] ; // just for 'padding' } ; // just so we can << test_t to cout std::ostream & operator<< ( std::ostream & osr , const test_t & x ) { osr << "test_t" ; return osr ; } ; // We define two functors for this type: one scalar functor, which // is realized by setting the template argument 'vsize' to 1: struct scalar_f : public vspline::unary_functor < test_t , test_t , 1 > { void eval ( const test_t & in , test_t & out ) const { std::cout << "scalar_f: " << in << std::endl ; out = in ; } } ; // another functor with no specified vsize, resulting in the // use of the default in simdize_traits. Here we provide an eval // overload matching vectorized arguments struct poly_f : public vspline::unary_functor < test_t , test_t > { void eval ( const test_t & in , test_t & out ) const { std::cout << "single-value poly_f: " << in << std::endl ; out = in ; } template < typename in_t , typename out_t > void eval ( const in_t & in , out_t & out ) const { std::cout << "vectorized poly_f: " << in << std::endl ; out = in ; } } ; // a functor processing 'double', which can be // processed by Vc if it is present. Note how vspline::unary_functor // 'looks' at vspline::vector_traits to 'glean' the type of vectorized // arguments it can process: if Vc is present, the code in vector.h // finds specializations of 'simd_traits' which 'produce' Vc::SimdArray // as 'type'. So this type appears in vector_traits and, from there, // is taken to be the vectorized argument type, which is of the same // type for input and output, in this case. Since the inheritance from // vspline::unary_functor is public, we can readily use this type by // the name 'in_v' and 'out_v'. If the unvectorized and the vectorized // eval method can share code, the operation can be coded as a member // function template, see the variant below. struct double_f : public vspline::unary_functor < double , double > { void eval ( const double & in , double & out ) const { std::cout << "unvectorized double_f: " << in << std::endl ; out = in ; } void eval ( const in_v & in , out_v & out ) const { std::cout << "vectorized double_f: " << in << std::endl ; out = in ; } } ; // Like in the previous example, all the code actually does (apart from // < { template < typename IN , typename OUT > void eval ( const IN & in , OUT & out ) const { std::cout << "t_double_f using eval template: " << in << std::endl ; out = in ; } } ; // Here we have a functor doing some arithmetic, which we'll 'apply' // to a large array, measuring execution time. struct sample_maths : public vspline::unary_functor < float , float > { template < typename IN , typename OUT > void eval ( const IN & in , OUT & out ) const { auto twice = in + in ; twice *= ( 3.5f + in / 5.5f ) ; out = twice - twice / 2.2f ; } } ; typedef vigra::TinyVector < float , 3 > f3_t ; struct sample_maths3 : public vspline::unary_functor < f3_t , f3_t > { template < typename IN , typename OUT > void eval ( const IN & in , OUT & out ) const { IN twice ; for ( int e = 0 ; e < 3 ; e++ ) { twice[e] = in[e] + in[e] ; twice[e] *= ( 3.5f + in[e] / 5.5f ) ; out[e] = twice[e] - twice[e] / 2.2f ; } } } ; #ifdef USE_VC // vector data like Vc::float_v are 'monolithic', since there is no // element-expansion defined for them. So they can be used as arguments // for a unary functor, and 'vectorizing' them will gather several of // them in a simd_tv. While this is not used inside vspline and such // data aren't currently usable in b-splines, it's nice to have, since // the processing of an array of vector data is automatically // multithreaded and the vectors are processed in small batches without // having to form SimdArrays from them. struct float_v_f : public vspline::unary_functor < Vc::float_v , Vc::float_v > { void eval ( const Vc::float_v & in , Vc::float_v & out ) const { out = in + 3.0f ; std::cout << "single-value float_v_f: " << out << std::endl ; } template < typename in_t , typename out_t > void eval ( const in_t & in , out_t & out ) const { for ( int e = 0 ; e < vsize ; e++ ) out[e] = in[e] + 3.0f ; std::cout << "vectorized float_v_f: " << out << std::endl ; } } ; #endif // using vspline's 'apply' function, we apply the functors to // arrays holding appropriate data and observe the output. Use of // scalar_f processes every element singly, the other calls // perform 'peeling': as long as full vectors can be formed, the // vectorized eval overload is called, followed by single-element // processing for the leftovers. // If the program was compiled with Vc (-DUSE_VC), we can observe // that the third call actually processes Vc data during the peeling // stage. int main ( int argc , char * argv[] ) { vigra::MultiArray < 1 , test_t > sa ( 18 ) ; vigra::MultiArray < 1 , double > da ( 18 ) ; vspline::apply ( scalar_f() , sa ) ; std::cout << std::endl ; vspline::apply ( poly_f() , sa ) ; std::cout << std::endl ; vspline::apply ( double_f() , da ) ; std::cout << std::endl ; vspline::apply ( t_double_f() , da ) ; #ifdef USE_VC // If we have an array with vector data, we can use 'apply' to // feed the vector data to a compatible unary_functor: vigra::MultiArray < 1 , Vc::float_v > fva ( 5 ) ; std::cout << std::endl ; vspline::apply ( float_v_f() , fva ) ; #endif // For the speed test, we set up a 3D array of 1 gigafloat. vigra::MultiArray < 3 , float > fa ( vigra::Shape3 ( 1024 , 1024 , 1024 ) ) ; // let's use this value for testing: float a = 7.7f ; float b ; // applying 'functor' to a, we obtain the result in b std::cout << std::endl ; auto functor = sample_maths() ; functor.eval ( a , b ) ; std::cout << "functor ( " << a << " ) = " << b << std::endl ; // we initialize 'fa' with a fa = a ; // and run the speed test std::cout << "speed test with vspline::apply" << std::endl ; auto start = std::chrono::system_clock::now() ; vspline::apply ( functor , fa ) ; auto end = std::chrono::system_clock::now() ; std::cout << "processing array 'fa' took " << std::chrono::duration_cast ( end - start ) . count() << " ms" << std::endl ; // we make sure we're not being fooled by taking a sample assert ( fa [ vigra::Shape3 ( 500 , 500 , 500 ) ] == b ) ; // as a reference, we use 'functor' in a scalar loop. Since the maths // involved are trivial, the compiler can do very well autovectorizing // the operation, so this operation is fast and we can assume that // our efforts at vectorization were succesful if the previous test // did take roughly the same time. It turns out that this depends // very much on the compiler used and the optimization level. std::cout << "speed test using scalar loop" << std::endl ; fa = a ; start = std::chrono::system_clock::now() ; float * pf = fa.data() ; for ( int i = 0 ; i < fa.size() ; i++ ) functor.eval ( pf[i] , pf[i] ) ; end = std::chrono::system_clock::now() ; std::cout << "processing array 'fa' took " << std::chrono::duration_cast ( end - start ) . count() << " ms" << std::endl ; // this actually fails with clang++ and -Ofast: assert ( fa [ vigra::Shape3 ( 500 , 500 , 500 ) ] == b ) ; vigra::MultiArray < 3 , f3_t > fa3 ( vigra::Shape3 ( 1024 , 1024 , 256 ) ) ; start = std::chrono::system_clock::now() ; auto functor3 = sample_maths3() ; vspline::apply ( functor3 , fa3 ) ; end = std::chrono::system_clock::now() ; std::cout << "processing array 'fa3' took " << std::chrono::duration_cast ( end - start ) . count() << " ms" << std::endl ; start = std::chrono::system_clock::now() ; auto pf3 = fa3.data() ; for ( int i = 0 ; i < fa3.size() ; i++ ) functor3.eval ( pf3[i] , pf3[i] ) ; end = std::chrono::system_clock::now() ; std::cout << "processing array 'fa3' with scalar loop took " << std::chrono::duration_cast ( end - start ) . count() << " ms" << std::endl ; } kfj-vspline-4b365417c271/example/bls.cpp000066400000000000000000000252741333775006700176440ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file bls.cpp /// /// \brief fidelity test /// /// This is a test to see how much a signal degrades when it is submitted /// to a sequence of operations: /// /// - create a b-spline over the signal /// - evaluate the spline at unit-spaced locations with an arbitrary offset /// - yielding a shifted signal, for which the process is repeated /// /// Finally, a last shift is performed which samples the penultimate version /// of the signal at points coinciding with coordinates 0...N-1 of the /// original signal. This last iteration should ideally recreate the /// original sequence. /// /// The test is done with a periodic signal to avoid margin effects. /// The initial sequence is created by performing an IFFT on random /// data up to a given cutoff frequency, producing a band-limited signal. /// Alternatively, by modifying the initial values which are put into /// the frequency-domain representation of the test signal, more specific /// scenarios can be investigated (try setting the values to 1 uniformly etc) /// With the cutoff factor approaching 1.0, the signal contains more and higher /// high frequencies, making it ever more difficult for the spline to /// represent the data faithfully. With a cutoff of >= .5, fidelity is /// very high for high-degree b-splines; even after many iterations /// the spline is nearly unchanged and faithfully represents the /// original signal. /// /// There are two factors to observe: high-degree splines cope better with /// high frequency components, but they suffer from the large magnitude of /// coefficient values, making the evaluation error larger - especially since /// high-degree splines have wide support. /// /// An important conclusion is that even though all input signals produced /// with this test are band-limited by design, this quality is not sufficient /// to produce a b-spline from them which is 'stable' in the sense of being /// immune to the multiple shifting operation, unless the signal is free from /// high frequencies. /// /// If the input contains high frequencies, and yet a faithful b-spline /// representation has to be provided which is immune to shifting, one /// solution is to create a high-degree spline over the signal, sample it /// at half unit steps and use the resulting signal to create the 'working' /// spline. By creating the high-order 'super-spline', a signal is created /// which is low in high-frequency content and can be faithfully represented /// by the working spline. The upsampling can also be done in any other /// way, like FFT/IFFT. See n_shift.cc for an implementation of upsampling /// using a high-degree b-spline as 'super-spline'. /// /// note that this example requires fftw3. /// /// compile with: clang++ -O3 -std=c++11 -obls bls.cpp -pthread -lvigraimpex -lfftw3 /// /// invoke with: bls [ ] #include #include #include #include #include int main ( int argc , char * argv[] ) { if ( argc < 3 ) { std::cerr << "pass the spline's degree and the number of iterations" << std::endl << "and, optionally, the cutoff frequency" << std::endl ; exit ( -1 ) ; } int degree = std::atoi ( argv[1] ) ; assert ( degree >= 0 && degree <= vspline_constants::max_degree ) ; int iterations = std::max ( 1 , std::atoi ( argv[2] ) ) ; double f_cutoff = .5 ; if ( argc == 4 ) { f_cutoff = std::atof ( argv[3] ) ; std::cout << "using frequency cutoff " << f_cutoff << std::endl ; } const int sz = 1024 ; vigra::MultiArray < 1 , double > original ( sz ) ; vigra::MultiArray < 1 , double > target ( sz ) ; vigra::MultiArray < 1 , vigra::FFTWComplex > fourier ( original.shape() / 2 + 1 ) ; std::random_device rd ; std::mt19937 gen ( rd() ) ; gen.seed ( 42 ) ; // level playing field std::uniform_real_distribution<> dis ( -1 , 1 ) ; int fill = f_cutoff * sz / 2.0 ; for ( auto & e : fourier ) { if ( fill < 0 ) e = vigra::FFTWComplex ( 0.0 , 0.0 ) ; else e = vigra::FFTWComplex ( 1.0 , 0.0 ) ; // ( dis(gen) , dis(gen) ) ; --fill ; } vigra::fourierTransformInverse ( fourier , original ) ; // now we set up the working spline auto m = original.norm ( 0 ) ; // . minimax ( &a , &b ) ; original /= m ; vspline::bspline < double , // spline's data type 1 > // one dimension bspw ( sz , // sz values degree , // degree as per command line vspline::PERIODIC , // periodic boundary conditions 0.0 ) ; // no tolerance // we pull in the working data we've just generated bspw.prefilter ( original ) ; using namespace vigra::multi_math ; using namespace vigra::acc; vigra::MultiArray < 1 , double > error_array ( vigra::multi_math::squaredNorm ( bspw.core ) ) ; AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; { std::cout << "coefficients Mean: " << sqrt(get(ac)) << std::endl; std::cout << "coefficients Maximum: " << sqrt(get(ac)) << std::endl; } // create an evaluator to obtain interpolated values typedef vspline::evaluator < double , double > ev_type ; // and set up the evaluator for the test ev_type evw ( bspw ) ; // we want to map the incoming coordinates into the defined range. // Since we're using a periodic spline, the range is fom 1...N, // rather than 1...N-1 for non-periodic splines auto gate = vspline::periodic ( 0.0 , double(sz) ) ; // now we do a bit of functional programming. // we chain gate and evaluator: auto periodic_ev = gate + evw ; // we cumulate the offsets so we can 'undo' the cumulated offset // in the last iteration double cumulated_offset = 0.0 ; for ( int n = 0 ; n < iterations ; n++ ) { // use a random, largish offset (+/- 1000). any offset // will do, since we have a periodic gate, mapping the // coordinates for evaluation into the spline's range double offset = 1000.0 * dis ( gen ) ; // with the last iteration, we shift back to the original // 0-based locations. This last shift should recreate the // original signal as best as a spline of this degree can // do after so many iterations. if ( n == iterations - 1 ) offset = - cumulated_offset ; cumulated_offset += offset ; if ( n > ( iterations - 10 ) ) std::cout << "iteration " << n << " offset " << offset << " cumulated offset " << cumulated_offset << std::endl ; // we evaluate the spline at unit-stepped offsetted locations, // so, 0 + offset , 1 + offset ... // in the last iteration, this should ideally reproduce the original // signal. for ( int x = 0 ; x < sz ; x++ ) { auto arg = x + offset ; target [ x ] = periodic_ev ( arg ) ; // std::cout << "eval: " << arg << " -> " << target[x] << std::endl ; } // now we create a new spline over target, reusing bspw // note how this merely changes the coefficients of the spline, // the container for the coefficients is reused, and therefore // the evaluator (evw) will look at the new set of coefficients. // So we don't need to create a new evaluator. bspw.prefilter ( target ) ; // to convince ourselves that we really are working on a different // sampling of the signal signal - and to see how close we get to the // original signal after n iterations, when we use a last offset to get // the sampling locations back to 0, 1, ... // Before the 'final' result, we echo the statistics for the last ten // iterations, to demonstrate that we aren't accidentally fooling // ourselves ;) vigra::MultiArray < 1 , double > error_array ( vigra::multi_math::squaredNorm ( target - original ) ) ; AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; if ( n > ( iterations - 10 ) ) { if ( n == iterations - 1 ) std::cout << "final result, evaluating at original unit steps" << std::endl ; std::cout << "signal difference Mean: " << sqrt(get(ac)) << std::endl; std::cout << "signal difference Maximum: " << sqrt(get(ac)) << std::endl; } } } kfj-vspline-4b365417c271/example/ca_correct.cc000066400000000000000000000467501333775006700207750ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2017, 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// ca_correct.cc /// /// Perform correction of chromatic aberration using a cubic polynomial /// This uses panotools-compatible parameters. Currently only processes /// 8bit RGB, processing is done on the sRGB data without conversion /// to linear and back, iamges with alpha channel won't compute. /// To see how the panotools lens correction model functions, /// please refer to https://wiki.panotools.org/Lens_correction_model /// /// compile with: /// clang++ -std=c++11 -march=native -o ca_correct -O3 -pthread -DUSE_VC ca_correct.cc -lvigraimpex -lVc /// /// you can also use g++ instead of clang++. If you don't have Vc on /// your system, omit '-DUSE_VC' and '-lVc'. /// /// invoke with ca_correct ar br cr dr ag bg cg dg ab bb cb db d e /// where 'ar' stands for 'parameter a for red channel' etc., and the /// trailing d and e are center shift in x and y direction in pixels. /// /// The purpose here is more to demonstrate how to implement the maths /// using vspline, but the program could easily be fleshed out to /// become really usable: /// /// - add alpha channel treatment /// - add processing of 16bit data /// - add colour space management and internal linear processing // TODO: handle incoming alpha channel // TODO: differentiate between 8/16 bit // TODO: operate in linear RGB // TODO: may produce transparent output where coordinate is out-of-range // TODO might allow to parametrize normalization (currently using PT's way) // TODO: while emulating the panotools way of ca correction is nice // to have, try implementing shift-curve based correction: with // pixel's radial distance r to origin (normalized to [0,1]) // access a 1D b-spline over M data points, so that // res = ev ( r * ( M - 1 ) ) // and picking up from source at res instead of r. // the coefficients of the shift curve can be generated by sampling the // 'normal' method or any other model, it's methodically neutral, within // the fidelity of the spline to the approximated signal, which can be // taken arbitrarily high. // TODO: consider two more shift curves Cx and Cy defined over [-1,1] // then, if the incoming coordinate is(x,y), let // x' = Cx ( x ) and y' = Cy ( y ). This would introduce a tensor- // based component, usable for sensor tilt compensation (check!) #include // pulls in all of vspline's functionality #include // in addition to and , // which are necessarily included by vspline, we want to use vigra's // import/export facilities to load and store images: #include #include #include // we'll be working with float data, so we set up a program-wide // constant for the vector width appropriate for float data const int VSIZE = vspline::vector_traits < float > :: size ; // we silently assume we have a colour image typedef vigra::RGBValue pixel_type; // coordinate_type is a 2D coordinate typedef vigra::TinyVector < float , 2 > coordinate_type ; // type of b-spline object used as interpolator typedef vspline::bspline < pixel_type , 2 > spline_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // type of b-spline evaluator producing single floats typedef vspline::evaluator < coordinate_type , // incoming coordinate's type float // processing single channel data > ev_type ; // gate type to force singular coordinates to a range // we are using mirror boundary conditions. Note that this may // produce artifacts in the margins. typedef vspline::mirror_gate < float > gate_type ; // mapper uses two gates, for x and y component typedef vspline::map_functor < coordinate_type , VSIZE , gate_type , gate_type > mapper_type ; // we start out by coding the functor implementing // the coordinate transformation for a single channel. // we inherit from vspline::unary_functor so that our coordinate // transformation fits well into vspline's functional processing scheme struct ev_radial_correction : public vspline::unary_functor < float , float > { // incoming coordinates are shifted by dx and dy. These values are expected // in image coordinates. The shift values should be so that a pixel which // is located at the intersection of the sensor with the optical axis comes // out as (0,0). If the optical system is perfectly centered, dx and dy will // be the coordinates of the image center, so if the image is size X * Y, // dx = ( X - 1 ) / 2 // dy = ( Y - 1 ) / 2 const float dx , dy ; // next we have a scaling factor. Once the coordinates are shifted to // coordinates relative to the optical axis, we apply a scaling factor // which 'normalizes' the pixel's distance from the optical axis. // A typical scaling factor would be the distance of the image center // from the top left corner at (-0.5,-0.5). With dx and dy as above // dxc = dx - -0.5 ; // dyc = dy - -0.5 ; // scale = 1 / sqrt ( dx * dx + dy * dy ) ; // Here we use a different choice to be compatible with panotools: // we use the vertical distance from image center to top/bottom // margin. // Since we'll be using a polynomial over the radial distance, picking // scale values larger or smaller than this 'typical' value can be used // to affect the precise effect of the radial function. // rscale is simply the reciprocal value for faster computation. const float scale ; const float rscale ; // After applying the scale, we have a normalized coordinate. The functor will // use this normalized coordinate to calculate the normalized distance from // the optical axis. The resulting distance is the argument to the radial // correction function. For the radial correction function, we use a cubic // polynomial, which needs four coefficients: const float a , b , c , d ; // finally we have the PT d and e values, which we label x_shift and y_shift // to avoid confusion with the fourth coefficient of the polynomial. const float x_shift , y_shift ; // we use two static functions to concisely initialize some of the // constant values above static double d_from_extent ( double d ) { return ( d - 1.0 ) / 2.0 ; } static double rscale_from_wh ( double w , double h ) { double dx = d_from_extent ( w ) ; double dy = d_from_extent ( h ) ; // I'd normalize to the corner, but to be compatible with panotools, // I use normalization to top margin center instead. // return sqrt ( dx * dx + dy * dy ) ; return sqrt ( dy * dy ) ; } // here's the constructor for the radial correction functor, taking all // the values passed from main() and initializing the constants ev_radial_correction ( const double & _width , const double & _height , const double & _x_shift , const double & _y_shift , const double & _a , const double & _b , const double & _c , const double & _d ) : dx ( d_from_extent ( _width ) ) , dy ( d_from_extent ( _height ) ) , x_shift ( _x_shift ) , y_shift ( _y_shift ) , rscale ( rscale_from_wh ( _width , _height ) ) , scale ( 1.0 / rscale_from_wh ( _width , _height ) ) , a ( _a ) , b ( _b ) , c ( _c ) , d ( _d ) { // we echo the internal state std::cout << "dx: " << dx << std::endl ; std::cout << "dy: " << dy << std::endl ; std::cout << "scale: " << scale << std::endl ; std::cout << "rscale: " << rscale << std::endl ; std::cout << "a: " << a << std::endl ; std::cout << "b: " << b << std::endl ; std::cout << "c: " << c << std::endl ; std::cout << "d: " << d << std::endl ; } ; // now we provide evaluation code for the functor. // since the code is the same for vectorized and unvectorized // operation, we can write a template, In words: // eval is a function template with the coordinate type as it's // template argument. eval receives it's argument as a const // reference to a coordinate and deposits it's result to a reference // to a coordinate. This function will not change the state of the // functor (hence the const) - the functor does not have mutable state // anyway. Note how CRD can be a single coordinate_type, or it's // vectorized equivalent. template < class CRD > void eval ( const CRD & in , CRD & result ) const { // set up coordinate-type variable to work on, copy input to it. CRD cc ( in ) ; // shift and scale // TODO: is it right to add the shift here, or should I subtract cc[0] -= ( dx + x_shift ) ; cc[0] *= scale ; cc[1] -= ( dy + y_shift ) ; cc[1] *= scale ; // calculate distance from center (this is normalized due to scaled cc) auto r = sqrt ( cc[0] * cc[0] + cc[1] * cc[1] ) ; // apply polynomial to obtain the scaling factor. auto rr = a * r * r * r + b * r * r + c * r + d ; // use rr to scale cc - this is the radial correction cc[0] *= rr ; cc[1] *= rr ; // apply rscale to revert to centered image coordinates cc[0] *= rscale ; cc[1] *= rscale ; // reverse initial shift to arrive at UL-based image coordinates cc[0] += ( dx + x_shift ) ; cc[1] += ( dy + y_shift ) ; // assign to result result = cc ; } } ; // next we set up the functor processing all three channels. This functor // receives three ev_radial_correction functors and three channel views: struct ev_ca_correct : public vspline::unary_functor < coordinate_type , pixel_type > { // these three functors hold the radial corrections for the three // colour channels ev_radial_correction rc_red ; ev_radial_correction rc_green ; ev_radial_correction rc_blue ; // these three functors hold interpolators for the colour channels ev_type ev_red ; ev_type ev_green ; ev_type ev_blue ; // and this object deals with out-of-bounds coordinates mapper_type m ; // the constructor receives all the functors we'll use. Note how we // can simply copy-construct the functors. ev_ca_correct ( const ev_radial_correction & _rc_red , const ev_radial_correction & _rc_green , const ev_radial_correction & _rc_blue , const ev_type & _ev_red , const ev_type & _ev_green , const ev_type & _ev_blue , const mapper_type & _m ) : rc_red ( _rc_red ) , rc_green ( _rc_green ) , rc_blue ( _rc_blue ) , ev_red ( _ev_red ) , ev_green ( _ev_green ) , ev_blue ( _ev_blue ) , m ( _m ) { } ; // the eval routine is simple, it simply applies the coordinate // transformation, applies the mapper to force the transformed // coordinate into the range, an then picks the interpolated value // using the interpolator for the channel. This is done for all // channels in turn. // since the code is the same for vectorized and unvectorized // operation, we can again write a template: template < class IN , class OUT > void eval ( const IN & c , OUT & result ) const { // work variable containing a (possibly vectorized) 2D coordinate IN cc ; // apply the radial correction to the incoming coordinate in c, // storing result to cc. Note that c contains the 'target' coordinate: // The coordinate of the pixel in the target which we want to compute rc_red.eval ( c , cc ) ; // force coordinate into the defined range (here we use mirroring) m.eval ( cc , cc ) ; // evaluate channel view at corrected coordinate, storing result // to the red channel of 'result' ev_red.eval ( cc , result[0] ) ; // ditto, for the remaining channels rc_green.eval ( c , cc ) ; m.eval ( cc , cc ) ; ev_green.eval ( cc , result[1] ) ; rc_blue.eval ( c , cc ) ; m.eval ( cc , cc ) ; ev_blue.eval ( cc , result[2] ) ; } } ; int main ( int argc , char * argv[] ) { if ( argc < 11 ) { std::cerr << "pass a colour image file as first argument" << std::endl ; std::cerr << "followed by a, b, c for red, green, blue" << std::endl ; std::cerr << "and the horizontal and vertical shift" << std::endl ; std::cerr << "like ca_correct 0.0001411 -0.0005236 0.0008456 1.0002093 0 0 0 1 0.0002334 -0.0007607 0.0011446 0.9996757 176 116" << std::endl ; exit( -1 ) ; } double ar = atof ( argv[2] ) ; double br = atof ( argv[3] ) ; double cr = atof ( argv[4] ) ; double dr = atof ( argv[5] ) ; double ag = atof ( argv[6] ) ; double bg = atof ( argv[7] ) ; double cg = atof ( argv[8] ) ; double dg = atof ( argv[9] ) ; double ab = atof ( argv[10] ) ; double bb = atof ( argv[11] ) ; double cb = atof ( argv[12] ) ; double db = atof ( argv[13] ) ; double x_shift = atof ( argv[14] ) ; // aka panotools 'd' double y_shift = atof ( argv[15] ) ; // aka panotools 'e' // get the image file name vigra::ImageImportInfo imageInfo ( argv[1] ) ; // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 2 > bcv ( vspline::NATURAL ) ; // create cubic 2D b-spline object containing the image data // TODO allow passing in arbitrary spline order spline_type bspl ( imageInfo.shape() , // the shape of the data for the spline 5 , // degree 5 == quintic spline bcv // specifies natural BCs along both axes ) ; // load the image data into the b-spline's core. This is a common idiom: // the spline's 'core' is a MultiArrayView to that part of the spline's // data container which precisely corresponds with the input data. // This saves loading the image to some memory first and then transferring // the data into the spline. Since the core is a vigra::MultiarrayView, // we can pass it to importImage as the desired target for loading the // image from disk. std::cout << "reading image " << argv[1] << " from disk" << std::endl ; vigra::importImage ( imageInfo , bspl.core ) ; // prefilter the b-spline std::cout << "setting up b-spline interpolator for image data" << std::endl ; bspl.prefilter() ; // this is where the result should go: target_type target ( imageInfo.shape() ) ; // process the image metrics float width = imageInfo.width() ; float height = imageInfo.height() ; // set up the radial transformation functors std::cout << "setting up radial correction for red channel:" << std::endl ; ev_radial_correction ca_red ( width , height , x_shift , y_shift , ar , br , cr , dr ) ; std::cout << "setting up radial correction for green channel:" << std::endl ; ev_radial_correction ca_green ( width , height , x_shift , y_shift , ag , bg , cg , dg ) ; std::cout << "setting up radial correction for blue channel:" << std::endl ; ev_radial_correction ca_blue ( width , height , x_shift , y_shift , ab , bb , cb , db ) ; // here we create the channel views. auto red_channel = bspl.get_channel_view ( 0 ) ; auto green_channel = bspl.get_channel_view ( 1 ) ; auto blue_channel = bspl.get_channel_view ( 2 ) ; // and set up the per-channel interpolators ev_type red_ev ( red_channel ) ; ev_type green_ev ( green_channel ) ; ev_type blue_ev ( blue_channel ) ; // next we set up coordinate mapping to the defined range gate_type g_x ( 0.0 , width - 1.0 ) ; gate_type g_y ( 0.0 , height - 1.0 ) ; mapper_type m ( g_x , g_y ) ; // using vspline's factory functions to create the 'gates' and the // 'mapper' applying them, we could instead create m like this: // (Note how we have to be explicit about using 'float' // for the arguments to the gates - using double arguments would not // work here unless we'd also specify the vector width.) // auto m = vspline::mapper < coordinate_type > // ( vspline::mirror ( 0.0f , width - 1.0f ) , // vspline::mirror ( 0.0f , height - 1.0f ) ) ; // finally, we create the top-level functor, passing in the three // radial correction functors, the channel-wise evaluators and the // mapper object ev_ca_correct correct ( ca_red , ca_green , ca_blue , red_ev , green_ev , blue_ev , m ) ; // now we obtain the result by performing a vspline::transform. this transform // successively passes discrete coordinates into the target to the functor // it's invoked with, storing the result of the functor's evaluation // at the self-same coordinates in it's target, so for each coordinate // (X,Y), target[(X,Y)] = correct(X,Y) std::cout << "rendering the target image" << std::endl ; vspline::transform ( correct , target ) ; // store the result with vigra impex std::cout << "storing the target image as 'ca_correct.tif'" << std::endl ; vigra::ImageExportInfo eximageInfo ( "ca_correct.tif" ); vigra::exportImage ( target , eximageInfo .setPixelType("UINT8") .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ; std::cout << "done" << std::endl ; } kfj-vspline-4b365417c271/example/channels.cc000066400000000000000000000137561333775006700204640ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2016 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file channels.cc /// /// \brief demonstrates the use of 'channel views' /// /// This example is derived from 'slice.cc', we use the same volume /// as source data. But instead of producing an image output, we create /// three separate colour channels of the bspline object and assert that /// the evaluation of the channel views is identical with the evaluation /// of the 'mother' spline. /// For a more involved example using channel views, see ca_correct.cc /// /// compile with: /// clang++ -std=c++11 -march=native -o channels -O3 -pthread -DUSE_VC channels.cc -lvigraimpex -lVc /// /// If you don't have Vc on your system, use /// /// clang++ -std=c++11 -march=native -o channels -O3 -pthread channels.cc -lvigraimpex #include #include #include #include #include int main ( int argc , char * argv[] ) { // pixel_type is the result type, an RGB float pixel typedef vigra::TinyVector < float , 3 > pixel_type ; // voxel_type is the source data type typedef vigra::TinyVector < float , 3 > voxel_type ; // coordinate_type has a 3D coordinate typedef vigra::TinyVector < float , 3 > coordinate_type ; // warp_type is a 2D array of coordinates typedef vigra::MultiArray < 2 , coordinate_type > warp_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ; // create quintic 3D b-spline object containing voxels vspline::bspline < voxel_type , 3 > space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ; // here we create the channel views. Since these are merely views // to the same data, no data will be copied, and it doesn't matter // whether we create these views before or after prefiltering. auto red_channel = space.get_channel_view ( 0 ) ; auto green_channel = space.get_channel_view ( 1 ) ; auto blue_channel = space.get_channel_view ( 2 ) ; // fill the b-spline's core with a three-way gradient for ( int z = 0 ; z < 10 ; z++ ) { for ( int y = 0 ; y < 10 ; y++ ) { for ( int x = 0 ; x < 10 ; x++ ) { voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ; c[0] = 25.5 * x ; c[1] = 25.5 * y ; c[2] = 25.5 * z ; } } } // prefilter the b-spline space.prefilter() ; // now make a warp array with 1920X1080 3D coordinates warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ; // we want the coordinates to follow this scheme: // warp(x,y) = (x,1-x,y) // scaled appropriately for ( int y = 0 ; y < 1080 ; y++ ) { for ( int x = 0 ; x < 1920 ; x++ ) { coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ; c[0] = float ( x ) / 192.0 ; c[1] = 10.0 - c[0] ; c[2] = float ( y ) / 108.0 ; } } // get an evaluator for the b-spline typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ; ev_type ev ( space ) ; // the evaluators of the channel views have their own type: typedef vspline::evaluator < coordinate_type , float > ch_ev_type ; // we create the three evaluators for the three channel views ch_ev_type red_ev ( red_channel ) ; ch_ev_type green_ev ( green_channel ) ; ch_ev_type blue_ev ( blue_channel ) ; // and make sure the evaluation results match for ( int y = 0 ; y < 1080 ; y++ ) { for ( int x = 0 ; x < 1920 ; x++ ) { coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ; assert ( ev ( c ) [ 0 ] == red_ev ( c ) ) ; assert ( ev ( c ) [ 1 ] == green_ev ( c ) ) ; assert ( ev ( c ) [ 2 ] == blue_ev ( c ) ) ; } } std::cout << "success" << std::endl ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/complex.cc000066400000000000000000000127341333775006700203330ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file complex.cc /// /// \brief demonstrate use of b-spline over std::complex data /// /// vspline handles std::complex data like pairs of the complex /// type's value_type, and uses a vigra::TinyVector of two /// simdized value_types as the vectorized type. /// /// compile: clang++ -std=c++11 -march=native -o complex -O3 -pthread -DUSE_VC complex.cc -lvigraimpex -lVc /// /// If you don't have Vc on your system, use /// /// clang++ -std=c++11 -march=native -o complex -O3 -pthread complex.cc -lvigraimpex #include #include #include #include #include #include int main ( int argc , char * argv[] ) { // nicely formatted output std::cout << std::fixed << std::showpoint << std::showpos << std::setprecision(6) ; // create default b-spline over 100 values vspline::bspline < std::complex < float > , 1 > bsp ( 100 ) ; // get a vigra::MultiArrayView to the spline's 'core'. This is the // area corresponding with the original data and is filled with zero. auto v1 = bsp.core ; // we only set one single value in the middle of the array // g++ does not like this: // v1 [ 50 ] = std::complex ( 3.3 + 2.2i ) ; v1 [ 50 ] = std::complex ( 3.3 , 2.2 ) ; // now we convert the original data to b-spline coefficients // by calling prefilter() bsp.prefilter() ; // and create an evaluator over the spline. Here we pass three // template arguments to the evaluator's type declaration: // - float for the 'incoming type' (coordinates) // - std::complex for the 'outgoing type' (values) // - 4 for the vector width, just for this example typedef vspline::evaluator < float , std::complex , 4 > ev_type ; // create the evalator auto ev = ev_type ( bsp ) ; // now we evaluate the spline in the region around the single // nonzero value in the input data and print argument and result float in ; std::complex out ; for ( int k = -12 ; k < 13 ; k++ ) { in = 50.0 + k * 0.1 ; // use ev's eval() method: ev.eval ( in , out ) ; std::cout << "1. ev(" << in << ") = " << out << std::endl ; // alternatively, use ev as a callable std::cout << "2. ev(" << in << ") = " << ev(in) << std::endl ; // ditto, but feed immediates. note we're passing float explicitly std::cout << "3. ev(" << in << ") = " << ev(50.0f + k * 0.1f) << std::endl ; } // repeat the example evaluation with vector data for ( int k = -12 ; k < 13 ; k++ ) { // feed the evaluator with vectors. Note how we obtain the // type of vectorized data the evaluator will accept from // the evaluator, by querying for it's 'in_v' type. When // using Vc (-DUSE_VC), the vectorized input type is a // Vc::SimdArray of four floats. The result appears as a // vigra::TinyVector of two Vc::SimdArrays of four floats, // the vectorized type which vspline deems appropriate for // complex data: one SimdArray for the real parts, one // SimdArray for the imaginary parts. // when Vc is not used, the same happens with simd_tv // instead of Vc::SimdArray. typename ev_type::in_v vk ( 50.0 + k * 0.1 ) ; std::cout << "ev(" << vk << ") = " << ev(vk) << std::endl ; } } kfj-vspline-4b365417c271/example/eval.cc000066400000000000000000000201011333775006700175760ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file eval.cc /// /// \brief simple demonstration of creation and evaluation of a b-spline /// /// takes a set of knot point values from std::cin, calculates a 1D b-spline /// over them, and evaluates it at coordinates taken from std::cin. /// The output shows how the coordinate is split into integral and real /// part and the result of evaluating the spline at this point. /// Note how the coordinate is automatically folded into the defined range. /// /// Two evaluations are demonstrated, using unvectorized and vectorized /// input/output. /// /// Since this is a convenient place to add code for testing evaluation /// speed, you may pass a number on the command line. eval.cc will then /// perform as many evaluations and print the time it took. /// The evaluations will be distributed to several threads, so there is /// quite a bit of overhead; pass numbers from 1000000 up to get realistic /// timings. /// /// compile: clang++ -std=c++11 -o eval -pthread eval.cc #include #include #include #include #include // you can use float, but then can't use very high spline degrees. // If you use long double, the code won't be vectorized in hardware. typedef double dtype ; typedef double rc_type ; enum { vsize = vspline::vector_traits::vsize } ; // speed_test will perform as many vectorized evaluations as the // range it receives spans. The evaluation is always at the same place, // trying to lower all memory access effects to the minimum. template < class ev_type > void speed_test ( vspline::index_range_type range , ev_type * pev ) { typename ev_type::in_v in = dtype(.111) ; typename ev_type::out_v out ; auto & ev = *pev ; for ( long n = range[0] ; n < range[1] ; n++ ) ev.eval ( in , out ) ; } int main ( int argc , char * argv[] ) { long TIMES = 0 ; if ( argc > 1 ) TIMES = std::atoi ( argv[1] ) ; // get the spline degree and boundary conditions from the console std::cout << "enter spline degree: " ; int spline_degree ; std::cin >> spline_degree ; int bci = -1 ; vspline::bc_code bc ; while ( bci < 1 || bci > 4 ) { std::cout << "choose boundary condition" << std::endl ; std::cout << "1) MIRROR" << std::endl ; std::cout << "2) PERIODIC" << std::endl ; std::cout << "3) REFLECT" << std::endl ; std::cout << "4) NATURAL" << std::endl ; std::cin >> bci ; } switch ( bci ) { case 1 : bc = vspline::MIRROR ; break ; case 2 : bc = vspline::PERIODIC ; break ; case 3 : bc = vspline::REFLECT ; break ; case 4 : bc = vspline::NATURAL ; break ; } // obtain knot point values dtype v ; std::vector dv ; std::cout << "enter knot point values (end with EOF)" << std::endl ; while ( std::cin >> v ) dv.push_back ( v ) ; std::cin.clear() ; // fix the type for the bspline object typedef vspline::bspline < dtype , 1 > spline_type ; spline_type bsp ( dv.size() , spline_degree , bc ) ; std::cout << "created bspline object:" << std::endl << bsp << std::endl ; // fill the data into the spline's 'core' area for ( size_t i = 0 ; i < dv.size() ; i++ ) bsp.core[i] = dv[i] ; // prefilter the data bsp.prefilter() ; std::cout << std::fixed << std::showpoint << std::setprecision(12) ; std::cout << "spline coefficients (with frame)" << std::endl ; for ( auto& coeff : bsp.container ) std::cout << " " << coeff << std::endl ; // obtain a 'safe' evaluator which folds incoming coordinates // into the defined range auto ev = vspline::make_safe_evaluator ( bsp ) ; int ic ; rc_type rc ; dtype res ; std::cout << "enter coordinates to evaluate (end with RET EOF)" << std::endl ; while ( ! std::cin.eof() ) { // get a coordinate std::cin >> rc ; if ( rc < bsp.lower_limit(0) || rc > bsp.upper_limit(0) ) { std::cout << "warning: " << rc << " is outside the spline's defined range." << std::endl ; std::cout << "using automatic folding to process this value." << std::endl ; } // evaluate the spline at this location ev.eval ( rc , res ) ; std::cout << rc << " -> " << res << std::endl ; // now we obtain 'in_v' and 'out_v' from the evaluator. // This may turn out a Vc::SimdArray or a vspline::simd_tv, // depending on the flags this program was compiled with: typedef decltype ( ev ) ev_t ; typedef typename ev_t::in_v vcrd_t ; typedef typename ev_t::out_v vres_t ; vcrd_t vv ( rc ) ; vres_t vres ; ev.eval ( vv , vres ) ; std::cout << "evaluation of the functor's vectorized type:" << std::endl ; std::cout << vv << " -> " << vres << std::endl ; if ( TIMES ) { std::chrono::system_clock::time_point start = std::chrono::system_clock::now() ; vspline::index_range_type range ( 0 , TIMES / vsize ) ; auto partitioning = vspline::index_range_splitter::part ( range , vspline::default_njobs , 1 ) ; // for the speed test we build a plain evaluator; we'll // only be evaluating the spline near 0.0 repeatedly, so // we don't need folding into the safe range. We're not // fixing 'specialize', so evaluation will be general // b-spline evaluation, even for degree 0 and 1 splines. auto ev = vspline::evaluator < dtype , dtype , vsize > ( bsp ) ; vspline::multithread ( &speed_test , partitioning , &ev ) ; std::chrono::system_clock::time_point end = std::chrono::system_clock::now() ; std::cout << TIMES << " evaluations took " << std::chrono::duration_cast ( end - start ) . count() << " ms" << std::endl ; } } } kfj-vspline-4b365417c271/example/examples.sh000077500000000000000000000011561333775006700205260ustar00rootroot00000000000000#! /bin/bash # compile all examples for f in $@ do body=$(basename $f .cc) common_flags="-O3 -Wno-abi -std=c++11 -march=native -mavx2 -pthread" # avx_flags="-O3 -Wno-abi -std=c++11 -mavx -pthread" # use this flags to avoid multithreading: # -DVSPLINE_SINGLETHREAD" for compiler in clang++ # g++ do echo compiling $body with: echo $compiler $common_flags -otv_$body $f -lvigraimpex $compiler $common_flags -otv_$body $f -lvigraimpex echo $compiler -DUSE_VC $common_flags -o$body $f -lVc -lvigraimpex $compiler -DUSE_VC $common_flags -o$body $f -lVc -lvigraimpex done done kfj-vspline-4b365417c271/example/gradient.cc000066400000000000000000000142371333775006700204610ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file gradient.cc /// /// \brief evaluating a specific spline, derivatives, precision /// /// If we create a b-spline over an array containing, at each grid point, /// the sum of the grid point's coordinates, each 1D row, column, etc will /// hold a linear gradient with first derivative == 1. If we use NATURAL /// BCs, evaluating the spline with real coordinates anywhere inside the /// defined range should produce precisely the sum of the coordinates. /// This is a good test for both the precision of the evaluation and it's /// correct functioning, particularly with higher-D arrays. /// /// compile: clang++ -O3 -DUSE_VC -march=native -std=c++11 -pthread -o gradient gradient.cc -lVc /// /// or clang++ -O3 -march=native -std=c++11 -pthread -o gradient gradient.cc #include #include #include int main ( int argc , char * argv[] ) { typedef vspline::bspline < double , 3 > spline_type ; typedef typename spline_type::shape_type shape_type ; typedef typename spline_type::view_type view_type ; typedef typename spline_type::bcv_type bcv_type ; // let's have a knot point array with nicely odd shape shape_type core_shape = { 35 , 43 , 19 } ; // we have to use a longish call to the constructor since we want to pass // 0.0 to 'tolerance' and it's way down in the argument list, so we have to // explicitly pass a few arguments which usually take default values before // we have a chance to pass the tolerance spline_type bspl ( core_shape , // shape of knot point array 3 , // cubic b-spline bcv_type ( vspline::NATURAL ) , // natural boundary conditions 0.0 ) ; // no tolerance // get a view to the bspline's core, to fill it with data view_type core = bspl.core ; // create the gradient in each dimension for ( int d = 0 ; d < bspl.dimension ; d++ ) { for ( int c = 0 ; c < core_shape[d] ; c++ ) core.bindAt ( d , c ) += c ; } // now prefilter the spline bspl.prefilter() ; // set up the evaluator type typedef vigra::TinyVector < double , 3 > coordinate_type ; typedef vspline::evaluator < coordinate_type , double > evaluator_type ; // we also want to verify the derivative along each axis typedef typename evaluator_type::derivative_spec_type deriv_t ; deriv_t dsx , dsy , dsz ; dsx[0] = 1 ; // first derivative along axis 0 dsy[1] = 1 ; // first derivative along axis 1 dsz[2] = 1 ; // first derivative along axis 2 // set up the evaluator for the underived result evaluator_type ev ( bspl ) ; // and evaluators for the three first derivatives evaluator_type ev_dx ( bspl , dsx ) ; evaluator_type ev_dy ( bspl , dsy ) ; evaluator_type ev_dz ( bspl , dsz ) ; // we want to bombard the evaluator with random in-range coordinates std::random_device rd; std::mt19937 gen(rd()); // std::mt19937 gen(12345); // fix starting value for reproducibility coordinate_type c ; // here comes our test, feed 100 random 3D coordinates and compare the // evaluator's result with the expected value, which is precisely the // sum of the coordinate's components. The printout of the derivatives // is boring: it's always 1. But this assures us that the b-spline is // perfectly plane, even off the grid points. for ( int times = 0 ; times < 100 ; times++ ) { for ( int d = 0 ; d < bspl.dimension ; d++ ) c[d] = ( core_shape[d] - 1 ) * std::generate_canonical(gen) ; double result ; ev.eval ( c , result ) ; double delta = result - sum ( c ) ; std::cout << "eval(" << c << ") = " << result << " -> delta = " << delta << std::endl ; ev_dx.eval ( c , result ) ; std::cout << "dx: " << result ; ev_dy.eval ( c , result ) ; std::cout << " dy: " << result ; ev_dz.eval ( c , result ) ; std::cout << " dz: " << result << std::endl ; } } kfj-vspline-4b365417c271/example/gradient2.cc000066400000000000000000000140321333775006700205340ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015, 2016 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// gradient.cc /// /// If we create a b-spline over an array containing, at each grid point, /// the sum of the grid point's coordinates, each 1D row, column, etc will /// hold a linear gradient with first derivative == 1. If we use NATURAL /// BCs, evaluating the spline with real coordinates anywhere inside the /// defined range should produce precisely the sum of the coordinates. /// This is a good test for both the precision of the evaluation and it's /// correct functioning, particularly with higher-D arrays. /// /// In this variant of the program, we use a vspline::domain functor. /// This functor provides a handy way to access a b-spline with normalized /// coordinates: instead of evaluating coordinates in the range of /// [ 0 , core_shape - 1 ] (natural spline coordinates), we pass coordinates /// in the range of [ 0 , 1 ] to the domain object, which is chained to /// the evaluator and passes natural spline coordinates to it. /// /// compile: clang++ -O3 -DUSE_VC -march=native -std=c++11 -pthread -o gradient2 gradient2.cc -lVc /// /// or clang++ -O3 -march=native -std=c++11 -pthread -o gradient2 gradient2.cc #include #include #include int main ( int argc , char * argv[] ) { typedef vspline::bspline < double , 3 > spline_type ; typedef typename spline_type::shape_type shape_type ; typedef typename spline_type::view_type view_type ; typedef typename spline_type::bcv_type bcv_type ; // let's have a knot point array with nicely odd shape shape_type core_shape = { 35 , 43 , 19 } ; // we have to use a longish call to the constructor since we want to pass // 0.0 to 'tolerance' and it's way down in the argument list, so we have to // explicitly pass a few arguments which usually take default values before // we have a chance to pass the tolerance spline_type bspl ( core_shape , // shape of knot point array 3 , // cubic b-spline bcv_type ( vspline::NATURAL ) , // natural boundary conditions 0.0 ) ; // no tolerance // get a view to the bspline's core, to fill it with data view_type core = bspl.core ; // create the gradient in each dimension for ( int d = 0 ; d < bspl.dimension ; d++ ) { for ( int c = 0 ; c < core_shape[d] ; c++ ) core.bindAt ( d , c ) += c ; } // now prefilter the spline bspl.prefilter() ; // set up an evaluator typedef vigra::TinyVector < double , 3 > coordinate_type ; typedef vspline::evaluator < coordinate_type , double > evaluator_type ; evaluator_type inner_ev ( bspl ) ; // create the domain from the bspline: auto dom = vspline::domain < coordinate_type > ( bspl ) ; // chain domain and inner evaluator auto ev = dom + inner_ev ; // we want to bombard the evaluator with random in-range coordinates std::random_device rd; std::mt19937 gen(rd()); // std::mt19937 gen(12345); // fix starting value for reproducibility coordinate_type c ; // here comes our test, feed 100 random 3D coordinates and compare the // evaluator's result with the expected value, which is precisely the // sum of the coordinate's components for ( int times = 0 ; times < 100 ; times++ ) { for ( int d = 0 ; d < bspl.dimension ; d++ ) { // note the difference to the code in gardient.cc here: // our test coordinates are in the range of [ 0 , 1 ] c[d] = std::generate_canonical(gen) ; } double result ; ev.eval ( c , result ) ; // 'result' is the same as in gradient.cc, but we have to calculate // the expected value differently. double delta = result - sum ( c * ( core_shape - 1 ) ) ; std::cout << "eval(" << c << ") = " << result << " -> delta = " << delta << std::endl ; } } kfj-vspline-4b365417c271/example/grind.cc000066400000000000000000000216351333775006700177670ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file grind.cc \brief performance test. The test is twofold: one aspect is the 'fidelity' of the filtering operations: if we repeatedly prefilter and restore the data, how badly do they suffer, given different data types? The second question is, how long does a prefilter-restore cycle take (roughly)? */ #include #include #include #include #include #include #include #include #include bool verbose = true ; // false ; // 'condense' aggregate types (TinyVectors etc.) into a single value template < typename T > double condense ( const T & t , std::true_type ) { return std::abs ( t ) ; } template < typename T > double condense ( const T & t , std::false_type ) { return sqrt ( sum ( t * t ) ) / t.size() ; } template < typename T > double condense ( const std::complex & t , std::false_type ) { return std::abs ( t ) ; } template < class T > using is_singular = typename std::conditional < std::is_fundamental < T > :: value , std::true_type , std::false_type > :: type ; template < typename T > double condense ( const T & t ) { return condense ( t , is_singular() ) ; } template < int dim , typename T > double check_diff ( vigra::MultiArrayView < dim , T > & reference , vigra::MultiArrayView < dim , T > & candidate ) { using namespace vigra::multi_math ; using namespace vigra::acc; assert ( reference.shape() == candidate.shape() ) ; vigra::MultiArray < 1 , double > error_array ( vigra::Shape1(reference.size() ) ) ; for ( int i = 0 ; i < reference.size() ; i++ ) { auto help = candidate[i] - reference[i] ; // std::cerr << reference[i] << " <> " << candidate[i] // << " CFD " << help << std::endl ; error_array [ i ] = condense ( help ) ; } AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; double mean = get(ac) ; double max = get(ac) ; if ( verbose ) { std::cout << "delta Mean: " << mean << std::endl; std::cout << "delta Maximum: " << max << std::endl; } return max ; } /// create an array of random data ditributed between -1 and 1. /// Then repeatedly prefilter and restore the data, and check the /// difference between the restored and original data, plus the /// time needed to do the processing. /// This test shows that for higher-degree splines, arithmetic /// precision becomes an issue, and using floats becomes futile. /// But it also shows that using double data and operating in /// double precision does not cost too much more processing time. /// keeping the data in float and using double maths does not /// help. template < int dim , typename T , typename math_ele_type > void grind_test ( vigra::TinyVector < int , dim > shape , vspline::bc_code bc , int spline_degree ) { typedef vigra::MultiArray < dim , T > array_type ; typedef vspline::bspline < T , dim > spline_type ; vigra::TinyVector < vspline::bc_code , dim > bcv { bc } ; spline_type bsp ( shape , spline_degree , bcv ) ; // , 0.0 ) ; // if ( verbose ) // std::cout << "grind: created b-spline:" << std::endl << bsp << std::endl ; std::random_device rd ; std::mt19937 gen ( rd() ) ; // gen.seed ( 765 ) ; // level playing field std::uniform_real_distribution<> dis ( -1.0 , 1.0 ) ; for ( auto & e : bsp.core ) e = dis ( gen ) ; // fill core with random data array_type reference = bsp.core ; // hold a copy of these data int times ; std::chrono::system_clock::time_point start = std::chrono::system_clock::now() ; for ( times = 0 ; times < 1000 ; ) { bsp.prefilter() ; vspline::restore < dim , T , math_ele_type > ( bsp ) ; times++ ; if ( times == 1 || times == 10 || times == 100 || times == 1000 ) { std::cout << "after " << times << " prefilter-restore cycles" << std::endl ; double emax = check_diff < dim , T > ( reference , bsp.core ) ; if ( verbose ) { // print a summary, can use '| grep CF' on cout std::cout << typeid(T()).name() << " CF " << " D " << dim << " " << bsp.core.shape() << " BC " << vspline::bc_name[bc] << " DG " << spline_degree << " TOL " << bsp.tolerance << " EMAX " << emax << std::endl ; } } } std::chrono::system_clock::time_point end = std::chrono::system_clock::now() ; std::cout << "average cycle duration: " << std::chrono::duration_cast ( end - start ) . count() / float ( times ) << " ms" << std::endl ; std::cout << "---------------------------------------------------" << std::endl ; std::cout << std::endl ; } int main ( int argc , char * argv[] ) { if ( argc < 2 ) { std::cerr << "pass the spline degree as parameter" << std::endl ; exit ( -1 ) ; } int degree = std::atoi ( argv[1] ) ; std::cout << "performing arithmetics in single precision" << std::endl ; grind_test < 2 , float , float > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , double , float > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , long double , float > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; std::cout << "performing arithmetics in double precision" << std::endl ; grind_test < 2 , float , double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , double , double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , long double , double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; std::cout << "performing arithmetics in long double precision" << std::endl ; grind_test < 2 , float , long double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , double , long double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; grind_test < 2 , long double , long double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; } kfj-vspline-4b365417c271/example/grok.cc000066400000000000000000000100101333775006700176070ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2017 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file grok.cc /// /// \brief demonstrates use of vspline::grok_type #include #include /// grokkee_type is a vspline::unary_functor returning twice it's input template < size_t vsize > struct grokkee_type : public vspline::unary_functor < double , double , vsize > { template < typename IN , typename OUT > void eval ( const IN & in , OUT & out ) const { out = in + in ; } } ; /// 'regular' function template doing the same template < typename T > void twice ( const T & in , T & out ) { out = in + in ; } /// test 'groks' a grokkee_type object 'grokkee' as a vspline::grok_type /// and calls the resulting object, which behaves just as a grokkee_type. template < size_t vsize > void test() { std::cout << "testing grok_type with vsize " << vsize << std::endl ; typedef vspline::grok_type < double , double , vsize > grok_t ; grokkee_type grokkee ; grok_t gk ( grokkee ) ; double x = 1.0 ; std::cout << x << " -> " << gk ( x ) << std::endl ; typedef typename grok_t::in_v argtype ; argtype xx = 1.0 ; std::cout << xx << " -> " << gk ( xx ) << std::endl ; } int main ( int argc , char * argv[] ) { test<1>() ; test<4>() ; test<8>() ; // with vsize == 1, we can construct a grok_type with a single // std::function: typedef vspline::grok_type < double , double , 1 > gk1_t ; typedef gk1_t::eval_type ev_t ; ev_t ev = & twice ; vspline::grok_type < double , double , 1 > gk1 ( ev ) ; // with vsize greater than 1, we need another std::function // for vectorized evaluation. typedef vspline::grok_type < double , double , 8 > gk8_t ; typedef gk8_t::v_eval_type v_ev_t ; typedef gk8_t::in_v v8_t ; v_ev_t vev = & twice ; vspline::grok_type < double , double , 8 > gk8 ( ev , vev ) ; } kfj-vspline-4b365417c271/example/gsm.cc000066400000000000000000000131461333775006700174500ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file gsm.cc /// /// \brief calculating the gradient squared magnitude, derivatives /// /// implementation of gsm.cc, performing the calculation of the /// gradient squared magnitude in a loop using two evaluators for /// the two derivatives, adding the squared magnitudes and writing /// the result to an image file /// /// compile with: /// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc /// or: clang++ -std=c++11 -march=native -o gsm -O3 -pthread gsm.cc -lvigraimpex /// /// invoke passing a colour image file. the result will be written to 'gsm.tif' #include #include #include #include #include // we silently assume we have a colour image typedef vigra::RGBValue pixel_type; // coordinate_type iss a 2D single precision coordinate typedef vigra::TinyVector < float , 2 > coordinate_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // b-spline evaluator producing float pixels typedef vspline::evaluator < coordinate_type , pixel_type > ev_type ; int main ( int argc , char * argv[] ) { if ( argc < 2 ) { std::cerr << "pass a colour image file as argument" << std::endl ; exit( -1 ) ; } vigra::ImageImportInfo imageInfo ( argv[1] ) ; // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 2 > bcv ( vspline::NATURAL ) ; // create cubic 2D b-spline object to receive the image data vspline::bspline < pixel_type , 2 > bspl ( imageInfo.shape() , 3 , bcv ) ; // load the image data into the bspline object's 'core' vigra::importImage ( imageInfo , bspl.core ) ; // prefilter the b-spline bspl.prefilter() ; // we create two evaluators for the b-spline, one for the horizontal and // one for the vertical gradient. The derivatives for a b-spline are requested // by passing a TinyVector with as many elements as the spline's dimension // with the desired derivative degree for each dimension. Here we want the // first derivative in x and y direction: const vigra::TinyVector < float , 2 > dx1_spec { 1 , 0 } ; const vigra::TinyVector < float , 2 > dy1_spec { 0 , 1 } ; // we pass the derivative specifications to the two evaluators' constructors ev_type xev ( bspl , dx1_spec ) ; ev_type yev ( bspl , dy1_spec ) ; // this is where the result should go: target_type target ( imageInfo.shape() ) ; // quick-shot solution, iterating in a loop, not vectorized for ( int y = 0 ; y < target.shape(1) ; y++ ) { for ( int x = 0 ; x < target.shape(0) ; x++ ) { coordinate_type crd { x , y } ; // now we get the two gradients by evaluating the gradient evaluators // at the given coordinate pixel_type dx , dy ; xev.eval ( crd , dx ) ; yev.eval ( crd , dy ) ; // and conclude by writing the sum of the squared gradients to target target [ crd ] = dx * dx + dy * dy ; } } // store the result with vigra impex vigra::ImageExportInfo eximageInfo ( "gsm.tif" ); std::cout << "storing the target image as 'gsm.tif'" << std::endl ; vigra::exportImage ( target , eximageInfo .setPixelType("UINT8") ) ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/gsm2.cc000066400000000000000000000210511333775006700175240ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file gsm2.cc /// /// \brief calculating the gradient squared magnitude, derivatives /// /// alternative implementation of gsm.cc, performing the calculation of the /// gradient squared magnitude with a functor and transform(), which is faster /// since the whole operation is multithreaded and potentially vectorized. /// /// compile with: /// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc /// or clang++ -std=c++11 -march=native -o gsm -O3 -pthread gsm.cc -lvigraimpex /// /// invoke passing an image file. the result will be written to 'gsm2.tif' #include #include #include #include #include // we silently assume we have a colour image typedef vigra::RGBValue pixel_type; // coordinate_type has a 2D coordinate typedef vigra::TinyVector < float , 2 > coordinate_type ; // type of b-spline object typedef vspline::bspline < pixel_type , 2 > spline_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // b-spline evaluator producing float pixels typedef vspline::evaluator < coordinate_type , // incoming coordinate's type pixel_type // singular result data type > ev_type ; /// we build a vspline::unary_functor which calculates the sum of gradient squared /// magnitudes. /// Note how the 'compound evaluator' we construct follows a pattern of /// - derive from vspline::unary_functor /// - keep const instances of 'inner' types /// - initialize these in the constructor, yielding a 'pure' functor /// - if the vector code is identical to the unvectorized code, implement /// eval() with a template // we start out by coding the evaluation functor. // this class does the actual computations: struct ev_gsm : public vspline::unary_functor < coordinate_type , pixel_type > { // we create two evaluators for the b-spline, one for the horizontal and // one for the vertical gradient. The derivatives for a b-spline are requested // by passing a TinyVector with as many elements as the spline's dimension // with the desired derivative degree for each dimension. Here we want the // first derivatives in x and in y direction: const vigra::TinyVector < float , 2 > dx1_spec { 1 , 0 } ; const vigra::TinyVector < float , 2 > dy1_spec { 0 , 1 } ; // we keep two 'inner' evaluators, one for each direction const ev_type xev , yev ; // which are initialized in the constructor, using the bspline and the // derivative specifiers ev_gsm ( const spline_type & bspl ) : xev ( bspl , dx1_spec ) , yev ( bspl , dy1_spec ) { } ; // since the code is the same for vectorized and unvectorized // operation, we can write a template: template < class IN , class OUT > void eval ( const IN & c , OUT & result ) const { OUT dx , dy ; xev.eval ( c , dx ) ; // get the gradient in x direction yev.eval ( c , dy ) ; // get the gradient in y direction // really, we'd like to write: // result = dx * dx + dy * dy ; // but fail due to problems with type inference: for vectorized // types, both dx and dy are vigra::TinyVectors of two simdized // values, and multiplying two such objects is not defined. // It's possible to activate code performing such operations by // defining the relevant traits in namespace vigra, but for this // example we'll stick with using multiply-assigns, which are defined. dx *= dx ; // square the gradients dy *= dy ; dx += dy ; // add them up result = dx ; // assign to result } } ; int main ( int argc , char * argv[] ) { if ( argc < 2 ) { std::cerr << "pass a colour image file as argument" << std::endl ; exit( -1 ) ; } // get the image file name vigra::ImageImportInfo imageInfo ( argv[1] ) ; // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 2 > bcv ( vspline::NATURAL ) ; // create cubic 2D b-spline object containing the image data spline_type bspl ( imageInfo.shape() , // the shape of the data for the spline 3 , // degree 3 == cubic spline bcv // specifies natural BCs along both axes ) ; // load the image data into the b-spline's core. This is a common idiom: // the spline's 'core' is a MultiArrayView to that part of the spline's // data container which precisely corresponds with the input data. // This saves loading the image to some memory first and then transferring // the data into the spline. Since the core is a vigra::MultiarrayView, // we can pass it to importImage as the desired target for loading the // image from disk. vigra::importImage ( imageInfo , bspl.core ) ; // prefilter the b-spline bspl.prefilter() ; // now we construct the gsm evaluator ev_gsm ev ( bspl ) ; // this is where the result should go: target_type target ( imageInfo.shape() ) ; // now we obtain the result by performing a vspline::transform. This function // successively passes discrete coordinates into the target to the evaluator // it's invoked with, storing the result of the evaluator's evaluation // at the self-same coordinates. This is done multithreaded and vectorized // automatically, so it's very convenient, if the evaluator is at hand. // So here we have invested moderately more coding effort in the evaluator // and are rewarded with being able to use the evaluator with vspline's // high-level code for a fast implementation of our gsm problem. // The difference is quite noticeable. On my system, processing a full HD // image, I get: // $ time ./gsm image.jpg // // real 0m0.838s // user 0m0.824s // sys 0m0.012s // // $ time ./gsm2 image.jpg // // real 0m0.339s // user 0m0.460s // sys 0m0.016s vspline::transform ( ev , target ) ; // store the result with vigra impex vigra::ImageExportInfo eximageInfo ( "gsm2.tif" ); std::cout << "storing the target image as 'gsm2.tif'" << std::endl ; vigra::exportImage ( target , eximageInfo .setPixelType("UINT8") ) ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/impulse_response.cc000066400000000000000000000106601333775006700222540ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file impulse_response.cc /// /// \brief get the impulse response of a b-spline prefilter /// /// filter a unit pulse with a b-spline prefilter of a given degree /// and display the central section of the result /// /// compile with: /// clang++ -std=c++11 -o impulse_response -O3 -pthread impulse_response.cc /// /// to get the central section with values beyond +/- 0.0042 of a degree 5 b-spline: /// /// impulse_response 5 .0042 /// /// producing this output: /// /// long double ir_5[] = { /// -0.0084918610197410 , /// +0.0197222540252632 , /// -0.0458040841925519 , /// +0.1063780046433000 , /// -0.2470419274022756 , /// +0.5733258709616592 , /// -1.3217294729875093 , /// +2.8421709220216247 , /// -1.3217294729875098 , /// +0.5733258709616593 , /// -0.2470419274022757 , /// +0.1063780046433000 , /// -0.0458040841925519 , /// +0.0197222540252632 , /// -0.0084918610197410 , /// } ; /// /// which, when used as a convolution kernel, will have the same effect on a signal /// as applying the recursive filter itself, but with lessened precision due to windowing. #include #include #include #include #include int main ( int argc , char * argv[] ) { if ( argc < 3 ) { std::cerr << "please pass spline degree and cutoff on the command line" << std::endl ; exit ( -1 ) ; } int degree = std::atoi ( argv[1] ) ; assert ( degree >= 0 && degree <= vspline_constants::max_degree ) ; long double cutoff = std::atof ( argv[2] ) ; std::cout << "calculating impulse response with spline degree " << degree << " and cutoff " << cutoff << std::endl ; // using the highest-level access to prefiltering, we code: vspline::bspline < long double , 1 > bsp ( 1001 , degree ) ; auto v1 = bsp.core ; v1 [ 500 ] = 1.0 ; bsp.prefilter() ; std::cout << "long double ir_" << degree << "[] = {" << std::endl ; std::cout << std::fixed << std::showpoint << std::setprecision(std::numeric_limits::max_digits10) ; for ( int k = 0 ; k < 1001 ; k++ ) { if ( std::abs ( v1[k] ) > cutoff ) { std::cout << v1[k] << "L ," << std::endl ; } } std::cout << "} ;" << std::endl ; } kfj-vspline-4b365417c271/example/int_spline.cc000066400000000000000000000171301333775006700210230ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file int_spline.cc \brief using a b-spline with integer coefficients vspline can use integral data types as b-spline coefficients, but doing so naively will produce imprecise results. This example demonstrates how to do it right. What needs to be done is using 'boosted', or amplified, coefficients, so that the range of possible coefficients is 'spread out' to the range of the integral coefficient type. After evaluation, the 'raw' result value has to be attenuated back to the range of original values. You can pass the 'boost' factor on the command line to see how different value influence precision of the results. You'll also notice that, when passing too high 'boost' values, the results will turn wrong due to the int coefficients overflowing. Prefiltering will create a signal with higher peak values, so the maximal boost factor has to be chosen to take this into account, see 'max_boost' below. */ #include #include #include #include /// given cf_type, the type for a b-spline coefficient, min and max /// as the signal's minimum and maximum, and the spline's degree, /// max_boost calculates the maximal safe boost to apply during /// prefiltering so that the signal won't be damaged. This is /// conservative, assuming the worst case (the value oscillating /// between min and max). Instead of doing the 'proper' maths, we /// take the easy route: build a periodic b-spline over min and max /// as the sole knot points, prefilter, get the coefficient with /// the highest absolute value. Then divide the maximal value /// of cf_type by this coefficient's absolute value. // TODO might put this function into the library code using vspline::xlf_type ; template < typename cf_type > static xlf_type max_boost ( xlf_type min , xlf_type max , const int & spline_degree ) { vspline::bspline < xlf_type , 1 > sample ( 2 , spline_degree , vspline::PERIODIC ) ; sample.core[0] = min ; sample.core[1] = max ; sample.prefilter() ; xlf_type d = std::max ( std::abs ( sample.core[0] ) , std::abs ( sample.core[1] ) ) ; if ( d != 0 ) return std::numeric_limits::max() / d ; return 1 ; } int main ( int argc , char * argv[] ) { double min_x = -1.0 ; double max_x = 1.0 ; // per default, apply maximum boost double boost = max_boost ( min_x , max_x , 3 ) ; // user has passed a different value on the CL; use that instead if ( argc > 1 ) boost = std::atof ( argv[1] ) ; assert ( boost != 0.0 ) ; // we'll create a spline over two know points only: vigra::MultiArray < 1 , double > original ( 2 ) ; original[0] = 1 ; original[1] = -1 ; // create a bspline object holding int coefficients vspline::bspline < int , 1 > ibspl ( 2 , 3 , vspline::PERIODIC ) ; // prefilter with 'boost' ibspl.prefilter ( original , boost ) ; // create a reference b-spline with double coefficients over the // same data, using no 'boost'. vspline::bspline < double , 1 > bspl ( 2 , 3 , vspline::PERIODIC ) ; bspl.prefilter ( original ) ; // to compare the results of using the 'boosted' int spline with the // 'ordinary' b-spline with double coefficients, we create evaluators // of both types. the first evaluator, which takes int coefficients, // is piped to an amplify_type object, which multiplies with a factor // of 1/boost, putting the signal back in range. // we need these template arguments, in sequence: // evaluator taking float coordinates and yielding double, aggregating // into SIMD types with 8 elements, using general b-spline evaluation, // doing maths in double precision and using int spline coefficients: typedef vspline::evaluator < float , double , 8 , -1 , double , int > iev_t ; // create such an evaluator from 'ibspl', the b-spline holding // (boosted) int coefficients iev_t _iev ( ibspl ) ; // to 'undo' the boost we use an amplify_type object with a // factor which is the reciprocal of 'boost' auto attenuate = vspline::amplify_type < double , double , double , 8 > ( 1.0 / boost ) ; // chain 'attenuate' to the evaluator, so that the evaluator's // output is multiplied with the factor in 'attenuate' auto iev = _iev + attenuate ; // the equivalent evaluator using the spline 'bspl' which holds // double coefficients with no boost, first it's type, then the object typedef vspline::evaluator < float , double , 8 > ev_t ; ev_t ev ( bspl ) ; // now we call both evaluators at 0 and 1, the extrema, // expecting to see the results 1 and -1, respectively. // If boost is too low, we may find quantization errors, // if it is too high, the signal will be damaged. std::cout << "iev ( 0 ) = " << iev ( 0.0f ) << std::endl ; std::cout << " ev ( 0 ) = " << ev ( 0.0f ) << std::endl ; std::cout << "iev ( 1 ) = " << iev ( 1.0f ) << std::endl ; std::cout << " ev ( 1 ) = " << ev ( 1.0f ) << std::endl ; // just to doublecheck: vectorized operation with 'iev' auto iv = vspline::simdized_type < float , 8 > ( 0 ) ; std::cout << iv << " -> " << iev ( iv ) << std::endl ; iv = vspline::simdized_type < float , 8 > ( 1 ) ; std::cout << iv << " -> " << iev ( iv ) << std::endl ; std::cout << "used boost = " << boost << std::endl ; } kfj-vspline-4b365417c271/example/mandelbrot.cc000066400000000000000000000171101333775006700210040ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2017 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file mandelbrot.cc /// /// \brief calculate an image of a section of the mandelbrot set /// /// to demonstrate that vspline's transform routines don't have to use /// b-splines at all, here's a simple example creating a functor /// to perform the iteration leading to the mandelbrot set, together /// with a vspline::domain adapting the coordinates and another /// functor to do the 'colorization'. The three functors are chained /// and fed to vspline::transform() to yield the image. /// /// compile with: /// clang++ -std=c++11 -march=native -o mandelbrot -O3 -pthread -DUSE_VC mandelbrot.cc -lvigraimpex -lVc /// or: clang++ -std=c++11 -march=native -o mandelbrot -O3 -pthread mandelbrot.cc -lvigraimpex /// /// invoke like /// /// mandelbrot -2 -1 1 1 /// /// ( -2 , -1 ) being lower left and ( 1 , 1 ) upper right /// /// the result will be written to 'mandelbrot.tif' #include #include #include #include #include // we want a colour image typedef vigra::RGBValue pixel_type; // coordinate_type has a 2D coordinate typedef vigra::TinyVector < double , 2 > coordinate_type ; typedef typename vspline::vector_traits < coordinate_type > :: type crd_v ; typedef typename vspline::vector_traits < double > :: type compo_v ; const int VSZ = vspline::vector_traits < coordinate_type > :: size ; typedef typename vspline::vector_traits < int , VSZ > :: type int_v ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; struct mandelbrot_functor : public vspline::unary_functor < coordinate_type , int > { typedef typename vspline::unary_functor < coordinate_type , int > base_type ; using base_type::vsize ; const int max_iterations = 255 ; const double threshold = 1000.0 ; // the single-value evaluation recognizably uses the well-known // iteration formula void eval ( const coordinate_type & c , int & m ) const { std::complex < double > cc ( c[0] , c[1] ) ; std::complex < double > z ( 0.0 , 0.0 ) ; for ( m = 0 ; m < max_iterations ; m++ ) { z = z * z + cc ; if ( std::abs ( z ) > threshold ) break ; } } // the vector code is a bit more involved, since the vectorized type // for a std::complex is a vigra::TinyVector of two SIMD types, // and we implement the complex maths 'manually': void eval ( const crd_v & c , int_v & m ) const { // state of the iteration crd_v z { 0.0f , 0.0f } ; // iteration count m = 0.0f ; for ( int i = 0 ; i < max_iterations ; i++ ) { // z = z * z ; using complex maths compo_v rr = z[0] * z[0] ; compo_v ii = z[1] * z[1] ; compo_v ri = z[0] * z[1] ; z[0] = rr - ii ; z[1] = 2.0f * ri ; // create a mask for those values which haven't exceeded the // theshold yet rr += ii ; auto mm = ( rr < threshold * threshold ) ; // if the mask is empty, all values have exceeded the threshold // and we end the iteration if ( none_of ( mm ) ) break ; // now we add 'c', the coordinate z += c ; // and increase the iteration count for all values which haven't // exceeded the threshold vspline::assign_if ( m , mm , m + 1 ) ; } } // #endif } ; struct colorize : public vspline::unary_functor < int , pixel_type , VSZ > { // to 'colorize' we produce black-and-white from the incoming // value's LSB template < class IN , class OUT > void eval ( const IN & c , OUT & result ) const { result = 255 * ( c & 1 ) ; } ; } ; int main ( int argc , char * argv[] ) { // get the extent of the section to show if ( argc < 5 ) { std::cerr << "please pass x0, y0, x1 and y1 on the command line" << std::endl ; exit ( -1 ) ; } double x0 = atof ( argv[1] ) ; double y0 = atof ( argv[2] ) ; double x1 = atof ( argv[3] ) ; double y1 = atof ( argv[4] ) ; // this is where the result should go: target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ; // the domain maps the image coordinates to the coordinates of the // section we want to display. The mapped coordinates, now in the range // of ((x0,y0), (x1,y1)), are fed to the functor calculating the result // of the iteration, and it's results are fed to a 'colorize' object // which translates the iteration depth values to a pixel value. auto f = vspline::domain < coordinate_type , VSZ > ( coordinate_type ( 0 , 0 ) , coordinate_type ( 1919 , 1079 ) , coordinate_type ( x0 , y0 ) , coordinate_type ( x1 , y1 ) ) + mandelbrot_functor() + colorize() ; // the combined functor is passed to transform(), which uses it for // every coordinate pair in 'target' and deposits the result at the // corresponding location. vspline::transform ( f , target ) ; // store the result with vigra impex vigra::ImageExportInfo imageInfo ( "mandelbrot.tif" ); std::cout << "storing the target image as 'mandelbrot.tif'" << std::endl ; vigra::exportImage ( target , imageInfo .setPixelType("UINT8") .setCompression("100") .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ; } kfj-vspline-4b365417c271/example/n_shift.cc000066400000000000000000000234311333775006700203120ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2017 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file n_shift.cc /// /// \brief fidelity test /// /// This is a test to see how much a signal degrades when it is submitted /// to a sequence of operations: /// /// - create a b-spline over the signal /// - evaluate the spline at unit-spaced locations with an arbitrary offset /// - yielding a shifted signal, for which the process is repeated /// /// Finally, a last shift is performed which samples the penultimate version /// of the signal at points coinciding with coordinates 0...N-1 of the /// original signal. This last iteration should ideally recreate the /// original sequence. /// /// The test is done with a periodic signal to avoid margin effects. /// The inital sequence is created by evaluating a periodic high-degree /// b-spline of half the size at steps n/2. This way we start out /// with a signal with low high-frequency content - a signal which can /// be approximated well with a b-spline. Optionally, the supersampling /// factor can be passed on the command line to experiment with different /// values than the default of 2. Supersampling factors should be whole /// numbers (halves also work, but less precise) - so that the knot /// points of the original signal coincide with knot points of the /// supersampled signal. This way, every interval contains a partial /// polynomial with no discontinuities, and the spline can faithfully /// represent the signal. /// /// See also bls.cpp, which produces test signals using IFFT. /// /// compile with: clang++ -pthread -O3 -std=c++11 n_shift.cc -o n_shift /// /// invoke like: n_shift 17 500 #include #include #include #include #include int main ( int argc , char * argv[] ) { if ( argc < 3 ) { std::cerr << "pass the spline's degree, the number of iterations" << std::endl << "and optionally the supersampling factor" << std::endl ; exit ( -1 ) ; } int degree = std::atoi ( argv[1] ) ; assert ( degree >= 0 && degree <= vspline_constants::max_degree ) ; int iterations = 1 + std::atoi ( argv[2] ) ; const int sz = 1024 ; long double widen = 2.0 ; if ( argc > 3 ) widen = atof ( argv[3] ) ; assert ( widen >= 1.0 ) ; int wsz = sz * widen ; vigra::MultiArray < 1 , long double > original ( wsz ) ; vigra::MultiArray < 1 , long double > target ( wsz ) ; // we start out by filling the first bit of 'original' with random data // between -1 and 1 std::random_device rd ; std::mt19937 gen ( rd() ) ; // gen.seed ( 1 ) ; // level playing field std::uniform_real_distribution<> dis ( -1 , 1 ) ; for ( int x = 0 ; x < sz ; x++ ) original [ x ] = dis ( gen ) ; // create the bspline object to produce the data we'll work with vspline::bspline < long double , // spline's data type 1 > // one dimension bsp ( sz , // sz values 20 , // high degree for smoothness vspline::PERIODIC , // periodic boundary conditions 0.0 ) ; // no tolerance vigra::MultiArrayView < 1 , long double > initial ( vigra::Shape1(sz) , original.data() ) ; // pull in the data while prefiltering bsp.prefilter ( initial ) ; // create an evaluator to obtain interpolated values typedef vspline::evaluator < long double , long double > ev_type ; ev_type ev ( bsp ) ; // from the bspline object we just made // now we evaluate at 1/widen steps, into target for ( int x = 0 ; x < wsz ; x++ ) target [ x ] = ev ( (long double) ( x ) / (long double) ( widen ) ) ; // we take this signal as our original. Since this is a sampling // of a periodic signal (the spline in bsp) representing a full // period, we assume that a b-spline over this signal will, within // the spline's capacity, approximate the signal in 'original'. // what we want to see is how sampling at offsetted positions // and recreating a spline over the offsetted signal will degrade // the signal with different-degree b-splines and different numbers // of iterations. original = target ; // now we set up the working spline vspline::bspline < long double , // spline's data type 1 > // one dimension bspw ( wsz , // wsz values degree , // degree as per command line vspline::PERIODIC , // periodic boundary conditions 0.0 ) ; // no tolerance // we pull in the working data we've just generated bspw.prefilter ( original ) ; // and set up the evaluator for the test ev_type evw ( bspw ) ; // we want to map the incoming coordinates into the defined range. // Since we're using a periodic spline, the range is fom 0...N, // rather than 0...N-1 for non-periodic splines auto gate = vspline::periodic ( 0.0L , (long double)(wsz) ) ; // now we do a bit of functional programming. // we chain gate and evaluator: auto periodic_ev = gate + evw ; // we cumulate the offsets so we can 'undo' the cumulated offset // in the last iteration long double cumulated_offset = 0.0 ; for ( int n = 0 ; n < iterations ; n++ ) { using namespace vigra::multi_math ; using namespace vigra::acc; // use a random, largish offset (+/- 1000). any offset // will do, since we have a periodic gate, mapping the // coordinates for evaluation into the spline's range long double offset = 1000.0 * dis ( gen ) ; // with the last iteration, we shift back to the original // 0-based locations. This last shift should recreate the // original signal as best as a spline of this degree can // do after so many iterations. if ( n == iterations - 1 ) offset = - cumulated_offset ; cumulated_offset += offset ; if ( n > ( iterations - 10 ) ) std::cout << "iteration " << n << " offset " << offset << " cumulated offset " << cumulated_offset << std::endl ; // we evaluate the spline at unit-stepped offsetted locations, // so, 0 + offset , 1 + offset ... // in the last iteration, this should ideally reproduce the original // signal. for ( int x = 0 ; x < wsz ; x++ ) { auto arg = x + offset ; target [ x ] = periodic_ev ( arg ) ; } // now we create a new spline over target, reusing bspw // note how this merely changes the coefficients of the spline, // the container for the coefficients is reused, and therefore // the evaluator (evw) will look at the new set of coefficients. // So we don't need to create a new evaluator. bspw.prefilter ( target ) ; // to convince ourselves that we really are working on a different // sampling of the signal signal - and to see how close we get to the // original signal after n iterations, when we use an offset to get // the sampling locations back to 0, 1, ... vigra::MultiArray < 1 , long double > error_array ( vigra::multi_math::squaredNorm ( target - original ) ) ; AccumulatorChain < long double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; if ( n > ( iterations - 10 ) ) { if ( n == iterations - 1 ) std::cout << "final result, evaluating at original unit steps" << std::endl ; std::cout << "signal difference Mean: " << sqrt(get(ac)) << std::endl; std::cout << "signal difference Maximum: " << sqrt(get(ac)) << std::endl; } } } kfj-vspline-4b365417c271/example/polish.cc000066400000000000000000000177131333775006700201640ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file polish.cc \brief 'polish' a b-spline several times, to see if it's precision can be improved. It turns out that, starting out with zero tolerance for the coefficient generation, there is very little room for improvement. */ #include #include #include #include #include #include #include #include #include bool verbose = true ; // false ; // 'condense' aggregate types (TinyVectors etc.) into a single value template < typename T > double condense ( const T & t , std::true_type ) { return std::abs ( t ) ; } template < typename T > double condense ( const T & t , std::false_type ) { return sqrt ( sum ( t * t ) ) / t.size() ; } template < typename T > double condense ( const std::complex & t , std::false_type ) { return std::abs ( t ) ; } template < class T > using is_singular = typename std::conditional < std::is_fundamental < T > :: value , std::true_type , std::false_type > :: type ; template < typename T > double condense ( const T & t ) { return condense ( t , is_singular() ) ; } template < int dim , typename T > double check_diff ( vigra::MultiArrayView < dim , T > & reference , vigra::MultiArrayView < dim , T > & candidate ) { using namespace vigra::multi_math ; using namespace vigra::acc; assert ( reference.shape() == candidate.shape() ) ; vigra::MultiArray < 1 , double > error_array ( vigra::Shape1(reference.size() ) ) ; for ( int i = 0 ; i < reference.size() ; i++ ) { auto help = candidate[i] - reference[i] ; // std::cerr << reference[i] << " <> " << candidate[i] // << " CFD " << help << std::endl ; error_array [ i ] = condense ( help ) ; } AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; double mean = get(ac) ; double max = get(ac) ; if ( verbose ) { std::cout << "delta Mean: " << mean << std::endl; std::cout << "delta Maximum: " << max << std::endl; } return max ; } template < int dim , typename T > void polish_test ( vigra::TinyVector < int , dim > shape , vspline::bc_code bc , int spline_degree ) { typedef vigra::MultiArray < dim , T > array_type ; typedef vspline::bspline < T , dim > spline_type ; array_type arr ( shape ) ; vigra::TinyVector < vspline::bc_code , dim > bcv { bc } ; spline_type bsp ( shape , spline_degree , bcv , 0.0 ) ; if ( verbose ) std::cout << "created b-spline:" << std::endl << bsp << std::endl ; std::random_device rd ; std::mt19937 gen ( rd() ) ; // gen.seed ( 765 ) ; // level playing field std::uniform_real_distribution<> dis ( -1.0 , 1.0 ) ; for ( auto & e : arr ) e = dis ( gen ) ; // fill array with random data array_type reference = arr ; // hold a copy of these data bsp.prefilter ( arr ) ; // suck into b-spline via prefilter vspline::restore < dim , T > ( bsp , arr ) ; // restore back to arr if ( verbose ) std::cout << "after restoration of original data:" << std::endl ; double emax = check_diff < dim , T > ( reference , arr ) ; if ( verbose ) { // print a summary, can use '| grep CF' on cout std::cout << typeid(T()).name() << " CF " // << vspline::pfs_name [ pfs ][0] << " D " << dim << " " << arr.shape() << " BC " << vspline::bc_name[bc] << " DG " << spline_degree << " TOL " << bsp.tolerance << " EMAX " << emax << std::endl ; } for ( int times = 1 ; times < 5 ; times++ ) { // difference original / restored arr -= reference ; // create another bspline spline_type polish_spl ( arr.shape() , spline_degree , bcv , 0.0 ) ; // prefilter it, sucking in the difference polish_spl.prefilter ( arr ) ; // and layer the resulting coeffs on top of the previous set bsp.core -= polish_spl.core ; // brace the 'augmented' spline bsp.brace() ; // and check it's quality vspline::restore < dim , T > ( bsp , arr ) ; if ( verbose ) std::cout << "after polishing run " << times << std::endl ; double emax2 = check_diff < dim , T > ( reference , arr ) ; if ( verbose ) { // print a summary, can use '| grep CF' on cout std::cout << typeid(T()).name() << " CF " // << vspline::pfs_name [ pfs ][0] << " D " << dim << " " << arr.shape() << " BC " << vspline::bc_name[bc] << " DG " << spline_degree << " TOL " << bsp.tolerance << " EMAX " << emax2 << std::endl ; } } } int main ( int argc , char * argv[] ) { if ( argc < 2 ) { std::cerr << "pass the spline degree as parameter" << std::endl ; exit ( -1 ) ; } int degree = std::atoi ( argv[1] ) ; polish_test < 2 , float > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; polish_test < 2 , double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; polish_test < 2 , long double > ( { 1000 , 1000 } ,vspline::PERIODIC , degree ) ; } kfj-vspline-4b365417c271/example/quickstart.cc000066400000000000000000000126131333775006700210520ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file quickstart.cc /// /// \brief sample code from the documentation /// /// just the bits of code given in the 'Quickstart' section of the documentation overview /// /// compile: clang++ -std=c++11 -o quickstart -pthread quickstart.cc #include #include #include using namespace std ; using namespace vigra ; using namespace vspline ; int main ( int argc , char * argv[] ) { // given a vigra::MultiArray of data (initialization omitted) vigra::MultiArray < 2 , float > a ( 10 , 20 ) ; // let's initialize the whole array with 42 a = 42 ; typedef vspline::bspline < float , 2 > spline_type ; // fix the type of the spline spline_type bspl ( a.shape() ) ; // create bspline object 'bspl' suitable for your data bspl.core = a ; // copy the source data into the bspline object's 'core' area bspl.prefilter() ; // run prefilter() to convert original data to b-spline coefficients // for a 2D spline, we want 2D coordinates typedef vigra::TinyVector < float ,2 > coordinate_type ; // get the appropriate evaluator type typedef vspline::evaluator < coordinate_type , float > eval_type ; // create the evaluator eval_type ev ( bspl ) ; // create variables for input and output: float x = 3 , y = 4 ; coordinate_type coordinate ( x , y ) ; float result ; // use the evaluator to produce the result ev.eval ( coordinate , result ) ; // evaluate at (x,y) auto r = ev ( coordinate ) ; // alternative evaluation as a functor assert ( r == result ) ; // create a 1D array containing (2D) coordinates into 'a' vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ; // we initialize the coordinate array by hand... coordinate_array[0] = coordinate_array[1] = coordinate_array[2] = coordinate ; // create an array to accomodate the result of the remap operation vigra::MultiArray < 1 , float > target_array ( 3 ) ; // perform the remap vspline::remap ( a , coordinate_array , target_array ) ; auto ic = coordinate_array.begin() ; for ( auto k : target_array ) assert ( k == ev ( *(ic++) ) ) ; // instead of the remap, we can use transform, passing the evaluator for // the b-spline over 'a' instead of 'a' itself. the result is the same. vspline::transform ( ev , coordinate_array , target_array ) ; // create a 2D array for the index-based transform operation vigra::MultiArray < 2 , float > target_array_2d ( 3 , 4 ) ; // use transform to evaluate the spline for the coordinates of // all values in this array vspline::transform ( ev , target_array_2d ) ; for ( int x = 0 ; x < 3 ; x ++ ) { for ( y = 0 ; y < 4 ; y++ ) { coordinate_type c { float(x) , float(y) } ; assert ( target_array_2d [ c ] == ev ( c ) ) ; } } vigra::MultiArray < 2 , float > b ( 10 , 20 ) ; vspline::transform ( ev , b ) ; auto ia = a.begin() ; for ( auto r : b ) assert ( vigra::closeAtTolerance ( *(ia++) , r , .00001 ) ) ; vigra::MultiArray < 2 , float > c ( 10 , 20 ) ; vspline::restore ( bspl , c ) ; // TODO: problem with g++ auto ib = b.begin() ; for ( auto & ic : c ) assert ( vigra::closeAtTolerance ( *(ib++) , ic , .00001 ) ) ; } kfj-vspline-4b365417c271/example/restore_test.cc000066400000000000000000000464501333775006700214100ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// restore_test - create b-splines from random data and restore the original /// data from the spline. This has grown to function as a unit test /// instantiating all sorts of splines and evaluators and using the evaluators /// with the functions in transform.h. /// /// Ideally, this test should result in restored data which are identical to /// the initial random data, but several factors come into play: /// /// the precision of the coefficient calculation is parametrizable in vspline, /// via the 'tolerance' parameter, or, in some places, the 'horizon' parameter. /// /// the data type of the spline is important - single precision data are unfit /// to produce coefficients capable of reproducing the data very precisely /// /// the spline degree is important. High degrees need wide horizons, but wide /// horizons also need many calculation steps, which may introduce errors. /// /// The dimension of the spline is also important. Since higher-dimension /// splines need more calculations for prefiltering and evaluation, results are /// less precise. This test program can test up to 4D. /// /// Please note that, since this program uses a good many different data types /// and tries to test as much of vspline's code as possible, compile times may /// be very long, especially if the higher-D tests are commented out. /// /// With dimensions > 2 running this program in full takes a long time, since /// it tries to be comprehensive to catch any corner cases which might have /// escaped scrutiny. Run time of this program is not a very good indicator /// of vspline's speed, since many operations (like statistics, array copy /// operations) are neither multithreaded nor vectorized. For speed tests, /// 'roundtrip' or 'grind' are better. #include #include #include #include #include #include #include bool verbose = true ; // false ; // 'condense' aggregate types (TinyVectors etc.) into a single value template < typename T > double condense ( const T & t , std::true_type ) { return std::abs ( t ) ; } template < typename T > double condense ( const T & t , std::false_type ) { return sqrt ( sum ( t * t ) ) / t.size() ; } template < typename T > double condense ( const std::complex & t , std::false_type ) { return std::abs ( t ) ; } template < class T > using is_singular = typename std::conditional < std::is_fundamental < T > :: value , std::true_type , std::false_type > :: type ; template < typename T > double condense ( const T & t ) { return condense ( t , is_singular() ) ; } // compare two arrays and calculate the mean and maximum difference template < int dim , typename T > double check_diff ( vigra::MultiArrayView < dim , T > & reference , vigra::MultiArrayView < dim , T > & candidate ) { using namespace vigra::multi_math ; using namespace vigra::acc; assert ( reference.shape() == candidate.shape() ) ; vigra::MultiArray < 1 , double > error_array ( vigra::Shape1(reference.size() ) ) ; for ( int i = 0 ; i < reference.size() ; i++ ) { auto help = candidate[i] - reference[i] ; // std::cerr << reference[i] << " <> " << candidate[i] // << " CFD " << help << std::endl ; error_array [ i ] = condense ( help ) ; } AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; double mean = get(ac) ; double max = get(ac) ; if ( verbose ) { std::cout << "delta Mean: " << mean << std::endl; std::cout << "delta Maximum: " << max << std::endl; } return max ; } /// do a restore test. This test fills the array that's /// passed in with small random data, constructs a b-spline with the requested /// parameters over the data, then calls vspline::restore(), which evaluates the /// spline at discrete locations. /// While this test fails to address several aspects (derivatives, behaviour /// at locations which aren't discrete), it does make sure that prefiltering /// has produced a correct result and reconstruction succeeds: If the spline /// coefficients were wrong, reconstruction would fail just as it would if /// the coefficients were right and the evaluation code was wrong. That both /// should be wrong and accidentally produce a correct result is highly unlikely. /// /// This routine has grown to be more of a unit test for all of vspline, /// additional vspline functions are executed and the results are inspected. /// This way we can assure that the transform-type routines are usable with /// all supported data types and vectorized and unvectorized results are /// consistent. template < int dim , typename T > double restore_test ( vigra::MultiArrayView < dim , T > & arr , vspline::bc_code bc , int spline_degree ) { if ( verbose ) std::cout << "**************************************" << std::endl << "testing type " << typeid(T()).name() << std::endl ; typedef vigra::MultiArray < dim , T > array_type ; typedef vspline::bspline < T , dim > spline_type ; vigra::TinyVector < vspline::bc_code , dim > bcv { bc } ; spline_type bsp ( arr.shape() , spline_degree , bcv , 0.0 ) ; if ( verbose ) std::cout << "created b-spline:" << std::endl << bsp << std::endl ; std::random_device rd ; std::mt19937 gen ( rd() ) ; // gen.seed ( 765 ) ; // level playing field std::uniform_real_distribution<> dis ( -1.0 , 1.0 ) ; for ( auto & e : arr ) e = dis ( gen ) ; array_type reference = arr ; bsp.prefilter ( arr ) ; vspline::restore < dim , T > ( bsp , arr ) ; if ( verbose ) std::cout << "after restoration of original data:" << std::endl ; double emax = check_diff < dim , T > ( reference , arr ) ; if ( verbose ) { // print a summary, can use '| grep CF' on cout std::cout << typeid(T()).name() << " CF " << " D " << dim << " " << arr.shape() << " BC " << vspline::bc_name[bc] << " DG " << spline_degree << " TOL " << bsp.tolerance << " EMAX " << emax << std::endl ; } // // have: coeffs in bsp, original data in reference, restored data in arr // // the next bit of code implements a trial of 'spline polishing': // the coefficients are filtered with the reconstruction kernel, and // the original data are subtracted from resulting 'restored' data, // producing an array of errors. A spline is erected over the // errors and it's coefficents are subtracted from the coefficients // in the first spline, 'augmenting' it to better represent the // original data. Seems to squash the error pretty much to arithmetic // precision. // difference original / restored arr -= reference ; // create another bspline spline_type polish ( arr.shape() , spline_degree , bcv , 0.0 ) ; // prefilter it, sucking in the difference polish.prefilter ( arr ) ; // and layer the resulting coeffs on top of the previous set bsp.core -= polish.core ; // brace the 'augmented' spline bsp.brace() ; // and check it's quality vspline::restore < dim , T > ( bsp , arr ) ; if ( verbose ) std::cout << "after polishing of spline:" << std::endl ; double emax2 = check_diff < dim , T > ( reference , arr ) ; if ( verbose ) { // print a summary, can use '| grep CF' on cout std::cout << typeid(T()).name() << " CF " << " D " << dim << " " << arr.shape() << " BC " << vspline::bc_name[bc] << " DG " << spline_degree << " TOL " << bsp.tolerance << " EMAX " << emax2 << std::endl ; } // test the factory functions make_evaluator and make_safe_evaluator auto raw_ev = vspline::make_evaluator < spline_type , float > ( bsp ) ; typedef typename decltype ( raw_ev ) :: in_type fc_type ; typedef typename decltype ( raw_ev ) :: in_nd_ele_type nd_fc_type ; // fc_type is what the evaluator expects as input type. This may be a plain // fundamental, so we produce an nD view to it for uniform handling. fc_type cs ( 0 ) ; nd_fc_type & nd_cs ( reinterpret_cast < nd_fc_type & > ( cs ) ) ; auto rs = raw_ev ( cs ) ; raw_ev.eval ( cs , rs ) ; // try evaluating the spline at it's lower and upper limit. We assign the // spline's limit values for each dimension via the nD view to cs: for ( int d = 0 ; d < dim ; d++ ) nd_cs[d] = bsp.lower_limit ( d ) ; raw_ev.eval ( cs , rs ) ; if ( verbose ) std::cout << cs << " -> " << rs << std::endl ; for ( int d = 0 ; d < dim ; d++ ) nd_cs[d] = bsp.upper_limit ( d ) ; raw_ev.eval ( cs , rs ) ; if ( verbose ) std::cout << cs << " -> " << rs << std::endl ; // additionally, we perform a test with a 'safe evaluator' and random // coordinates. For a change we use double precision coordinates auto _ev = vspline::make_safe_evaluator < spline_type , double > ( bsp ) ; enum { vsize = decltype ( _ev ) :: vsize } ; typedef typename decltype ( _ev ) :: in_type coordinate_type ; typedef typename decltype ( _ev ) :: in_ele_type rc_type ; typedef typename decltype ( _ev ) :: out_ele_type ele_type ; // throw a domain in to test that as well: auto dom = vspline::domain < coordinate_type , spline_type , vsize > ( bsp , coordinate_type(-.3377) , coordinate_type(3.11) ) ; auto ev = vspline::grok ( dom + _ev ) ; coordinate_type c ; vigra::MultiArray < 1 , coordinate_type > ca ( vigra::Shape1 ( 10003 ) ) ; vigra::MultiArray < 1 , T > ra ( vigra::Shape1 ( 10003 ) ) ; vigra::MultiArray < 1 , T > ra2 ( vigra::Shape1 ( 10003 ) ) ; auto pc = (rc_type*) &c ; // make sure we can evaluate at the lower and upper limit int k = 0 ; { for ( int e = 0 ; e < dim ; e++ ) pc[e] = 0.0 ; ra[k] = ev ( c ) ; ca[k] = c ; } k = 1 ; { for ( int e = 0 ; e < dim ; e++ ) pc[e] = 1.0 ; ra[k] = ev ( c ) ; ca[k] = c ; } k = 2 ; { for ( int e = 0 ; e < dim ; e++ ) pc[e] = -2.0 ; ra[k] = ev ( c ) ; ca[k] = c ; } k = 3 ; { for ( int e = 2.0 ; e < dim ; e++ ) pc[e] = 1.0 ; ra[k] = ev ( c ) ; ca[k] = c ; } // the remaining coordinates are picked randomly std::uniform_real_distribution<> dis2 ( -2.371 , 2.1113 ) ; for ( k = 4 ; k < 10003 ; k++ ) { for ( int e = 0 ; e < dim ; e++ ) pc[e] = dis2 ( gen ) ; ra[k] = ev ( c ) ; ca[k] = c ; } // run an index-based transform. With a domain in action, this // does *not* recreate the original data vspline::transform ( ev , arr ) ; // run an array-based transform. result should be identical // to single-value-eval above, within arithmetic precision limits // given the specific optimization level. // we can produce different views on the data to make sure feeding // the coordinates still works correctly: vigra::MultiArrayView < 2 , coordinate_type > ca2d ( vigra::Shape2 ( 99 , 97 ) , ca.data() ) ; vigra::MultiArrayView < 2 , T > ra2d ( vigra::Shape2 ( 99 , 97 ) , ra2.data() ) ; vigra::MultiArrayView < 2 , coordinate_type > ca2ds ( vigra::Shape2 ( 49 , 49 ) , vigra::Shape2 ( 2 , 200 ) , ca.data() ) ; vigra::MultiArrayView < 2 , T > ra2ds ( vigra::Shape2 ( 49 , 49 ) , vigra::Shape2 ( 2 , 200 ) , ra2.data() ) ; vigra::MultiArrayView < 3 , coordinate_type > ca3d ( vigra::Shape3 ( 20 , 21 , 19 ) , ca.data() ) ; vigra::MultiArrayView < 3 , T > ra3d ( vigra::Shape3 ( 20 , 21 , 19 ) , ra2.data() ) ; vspline::transform ( ev , ca , ra2 ) ; vspline::transform ( ev , ca2d , ra2d ) ; vspline::transform ( ev , ca2ds , ra2ds ) ; vspline::transform ( ev , ca3d , ra3d ) ; // vectorized and unvectorized operation may produce slightly // different results in optimized code. // usually assert ( ra == ra2 ) holds, but we don't consider // it an error if it doesn't and allow for a small difference auto dsv = check_diff < 1 , T > ( ra , ra2 ) ; auto tolerance = 10 * std::numeric_limits::epsilon() ; if ( dsv > tolerance ) std::cout << vspline::bc_name[bc] << " - max difference single/vectorized eval " << dsv << std::endl ; if ( dsv > .0001 ) { std::cout << vspline::bc_name[bc] << " - excessive difference single/vectorized eval " << dsv << std::endl ; for ( int k = 0 ; k < 10003 ; k++ ) { if ( ra[k] != ra2[k] ) { std::cout << "excessive at k = " << k << ": " << ca[k] << " -> " << ra[k] << ", " << ra2[k] << std::endl ; } } } return emax ; } using namespace vspline ; template < class arr_t > double view_test ( arr_t & arr ) { double emax = 0.0 ; enum { dimension = arr_t::actual_dimension } ; typedef typename arr_t::value_type value_type ; vspline::bc_code bc_seq[] { PERIODIC , MIRROR , REFLECT , NATURAL } ; // TODO: for degree-1 splines, I sometimes get different results // for unvectorized and vectorized operation. why? for ( int spline_degree = 0 ; spline_degree < 8 ; spline_degree++ ) { for ( auto bc : bc_seq ) { auto e = restore_test < dimension , value_type > ( arr , bc , spline_degree ) ; if ( e > emax ) emax = e ; } } return emax ; } int d0[] { 1 , 2 , 3 , 5 , 8 , 13 , 16 , 21 , 34 , 55 , 123 , 128 , 289 , 500 , 1031 , 2001 , 4999 } ; int d1[] { 1 , 2 , 3 , 5 , 8 , 13 , 16 , 21 , 34 , 89 , 160 , 713 } ; int d2[] { 2 , 3 , 5 , 8 , 13 , 21 } ; int d3[] { 2 , 3 , 5 , 8 , 13 } ; int* dext[] { d0 , d1 , d2 , d3 } ; int dsz[] { sizeof(d0) / sizeof(int) , sizeof(d1) / sizeof(int) , sizeof(d2) / sizeof(int) , sizeof(d3) / sizeof(int) } ; template < int dim , typename T > struct test { typedef vigra::TinyVector < int , dim > shape_type ; double emax = 0.0 ; double operator() () { shape_type dshape ; for ( int d = 0 ; d < dim ; d++ ) dshape[d] = dsz[d] ; vigra::MultiCoordinateIterator i ( dshape ) , end = i.getEndIterator() ; while ( i != end ) { shape_type shape ; for ( int d = 0 ; d < dim ; d++ ) shape[d] = * ( dext[d] + (*i)[d] ) ; vigra::MultiArray < dim , T > _arr ( 2 * shape + 1 ) ; auto stride = _arr.stride() * 2 ; vigra::MultiArrayView < dim , T > arr ( shape , stride , _arr.data() + long ( sum ( stride ) ) ) ; auto e = view_test ( arr ) ; if ( e > emax ) emax = e ; ++i ; // make sure that we have only written back to 'arr', leaving // _arr untouched for ( auto & e : arr ) e = T(0.0) ; for ( auto e : _arr ) assert ( e == T(0.0) ) ; } return emax ; } } ; template < typename T > struct test < 0 , T > { double operator() () { return 0.0 ; } ; } ; template < int dim , typename tuple_type , int ntypes = std::tuple_size::value > struct multitest { void operator() () { typedef typename std::tuple_element::type T ; auto e = test < dim , T >() () ; std::cout << "test for type " << typeid(T()).name() << ": max error = " << e << std::endl ; multitest < dim , tuple_type , ntypes - 1 >() () ; } } ; template < int dim , typename tuple_type > struct multitest < dim , tuple_type , 0 > { void operator() () { } } ; int main ( int argc , char * argv[] ) { std::cout << std::fixed << std::showpos << std::showpoint << std::setprecision(18); std::cerr << std::fixed << std::showpos << std::showpoint << std::setprecision(18); int test_dim = 2 ; if ( argc > 1 ) test_dim = std::atoi ( argv[1] ) ; if ( test_dim > 4 ) test_dim = 4 ; std::cout << "testing with " << test_dim << " dimensions" << std::endl ; typedef std::tuple < vigra::TinyVector < double , 1 > , vigra::TinyVector < float , 1 > , vigra::TinyVector < double , 2 > , vigra::TinyVector < float , 2 > , vigra::TinyVector < long double , 3 > , vigra::TinyVector < double , 3 > , vigra::TinyVector < float , 3 > , vigra::RGBValue , vigra::RGBValue , vigra::RGBValue , std::complex < long double > , std::complex < double > , std::complex < float > , long double , double , float > tuple_type ; switch ( test_dim ) { case 1: multitest < 1 , tuple_type >() () ; break ; case 2: multitest < 2 , tuple_type >() () ; break ; // case 3: // multitest < 3 , tuple_type >() () ; // break ; // case 4: // multitest < 4 , tuple_type >() () ; // break ; default: break ; } std::cout << "terminating" << std::endl ; } kfj-vspline-4b365417c271/example/roundtrip.cc000066400000000000000000000441051333775006700207070ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file roundtrip.cc /// /// \brief benchmarking and testing code for various vspline capabilities /// /// load an image, create a b-spline from it, and restore the original data, /// both by normal evaluation and by convolution with the reconstruction kernel. /// all of this is done several times each with different boundary conditions, /// spline degrees and in float and double arithmetic, the processing times /// and differences between input and restored signal are printed to cout. /// /// obviously, this is not a useful program, it's to make sure the engine functions as /// intended and all combinations of float and double as values and coordinates compile /// and function as intended, also giving an impression of the speed of processing. /// /// compile: /// clang++ -std=c++11 -march=native -o roundtrip -O3 -pthread -DUSE_VC roundtrip.cc -lvigraimpex -lVc /// or: clang++ -std=c++11 -march=native -o roundtrip -O3 -pthread roundtrip.cc -lvigraimpex /// /// invoke: roundtrip /// /// there is no image output. #include #include #include #include #include #include #include #define PRINT_ELAPSED #ifdef PRINT_ELAPSED #include #include #endif using namespace std ; using namespace vigra ; /// check for differences between two arrays template < class view_type > long double check_diff ( const view_type& a , const view_type& b ) { using namespace vigra::multi_math ; using namespace vigra::acc; typedef typename view_type::value_type value_type ; typedef typename vigra::ExpandElementResult < value_type > :: type real_type ; typedef MultiArray error_array ; error_array ea ( vigra::multi_math::squaredNorm ( b - a ) ) ; AccumulatorChain > ac ; extractFeatures(ea.begin(), ea.end(), ac); std::cout << "warped image diff Mean: " << sqrt(get(ac)) << std::endl; long double max_error = sqrt(get(ac)) ; std::cout << "warped image diff Maximum: " << max_error << std::endl ; // if ( max_error > .01 ) // { // for ( int y = 0 ; y < a.shape(1) ; y++ ) // { // for ( int x = 0 ; x < a.shape(0) ; x++ ) // { // auto pa = a ( x , y ) ; // auto pb = b ( x , y ) ; // auto na = pa[0] * pa[0] + pa[1] * pa[1] + pa[2] * pa[2] ; // auto nb = pb[0] * pb[0] + pb[1] * pb[1] + pb[2] * pb[2] ; // if ( std::abs ( na - nb ) > .01 ) // std::cout << "excess at ( " << x << " , " // << y << " )" << std::endl ; // } // } // } return max_error ; } template < class view_type , typename real_type , typename rc_type , int specialize > long double run_test ( view_type & data , vspline::bc_code bc , int DEGREE , int TIMES = 32 ) { typedef typename view_type::value_type pixel_type ; typedef typename view_type::difference_type Shape; typedef MultiArray < 2 , pixel_type > array_type ; typedef int int_type ; long double max_error = 0.0L ; long double error ; // we use simdized types with as many elements as vector_traits // considers appropriate for a given real_type, which is the elementary // type of the (pixel) data we process: const int vsize = vspline::vector_traits < real_type > :: size ; vspline::bcv_type < view_type::actual_dimension > bcv ( bc ) ; int Nx = data.width() ; int Ny = data.height() ; vspline::bspline < pixel_type , 2 > bsp ( data.shape() , DEGREE , bcv ) ; bsp.core = data ; // first test: time prefilter #ifdef PRINT_ELAPSED std::chrono::system_clock::time_point start = std::chrono::system_clock::now(); std::chrono::system_clock::time_point end ; #endif for ( int times = 0 ; times < TIMES ; times++ ) bsp.prefilter() ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x prefilter:............................ " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif // to time and test 1D operation, we pretend the data are 1D, // prefilter and restore them. start = std::chrono::system_clock::now(); // cast data to 1D array vigra::MultiArrayView < 1 , pixel_type > fake_1d_array ( vigra::Shape1 ( prod ( data.shape() ) ) , data.data() ) ; vigra::TinyVector < vspline::bc_code , 1 > bcv1 ( bcv[0] ) ; vspline::bspline < pixel_type , 1 > bsp1 ( fake_1d_array.shape() , DEGREE , bcv1 ) ; bsp1.core = fake_1d_array ; for ( int times = 0 ; times < TIMES ; times++ ) bsp1.prefilter() ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x prefilter as fake 1D array:........... " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif // use fresh data, data above are useless after TIMES times filtering bsp1.core = fake_1d_array ; bsp1.prefilter() ; start = std::chrono::system_clock::now(); vigra::MultiArray < 1 , pixel_type > fake_1d_target ( vigra::Shape1 ( prod ( data.shape() ) ) ) ; vspline::restore < 1 , pixel_type > ( bsp1 , fake_1d_target ) ; for ( int times = 1 ; times < TIMES ; times++ ) vspline::restore < 1 , pixel_type > ( bsp1 , fake_1d_target ) ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x restore original data from 1D:........ " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif cout << "difference original data/restored data:" << endl ; error = check_diff ( fake_1d_array , fake_1d_target ) ; if ( error > max_error ) max_error = error ; // that's all of the 1D testing we do, back to the 2D data. // use fresh data, data above are useless after TIMES times filtering bsp.core = data ; bsp.prefilter() ; // get a view to the core coefficients (those which aren't part of the brace) view_type cfview = bsp.core ; // set the coordinate type typedef vigra::TinyVector < rc_type , 2 > coordinate_type ; // // set the evaluator type // typedef vspline::evaluator eval_type ; // // // create the evaluator for the b-spline, using plain evaluation (no derivatives) // // eval_type ev ( bsp ) ; // spline typedef vspline::bspline < pixel_type , 2 > spline_type ; auto ev = vspline::make_safe_evaluator < spline_type , rc_type > ( bsp ) ; // type for coordinate array typedef vigra::MultiArray<2, coordinate_type> coordinate_array ; int Tx = Nx ; int Ty = Ny ; // now we create a warp array of coordinates at which the spline will be evaluated. // Also create a target array to contain the result. coordinate_array fwarp ( Shape ( Tx , Ty ) ) ; array_type _target ( Shape(Tx,Ty) ) ; view_type target ( _target ) ; rc_type dfx = 0.0 , dfy = 0.0 ; // currently evaluating right at knot point locations for ( int times = 0 ; times < 1 ; times++ ) { for ( int y = 0 ; y < Ty ; y++ ) { rc_type fy = (rc_type)(y) + dfy ; for ( int x = 0 ; x < Tx ; x++ ) { rc_type fx = (rc_type)(x) + dfx ; // store the coordinate to fwarp[x,y] fwarp [ Shape ( x , y ) ] = coordinate_type ( fx , fy ) ; } } } // second test. perform a transform using fwarp as warp array. Since fwarp contains // the discrete coordinates to the knot points, converted to float, the result // should be the same as the input within the given precision #ifdef PRINT_ELAPSED start = std::chrono::system_clock::now(); #endif for ( int times = 0 ; times < TIMES ; times++ ) vspline::transform ( ev , fwarp , target ) ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x transform with ready-made bspline:.... " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif error = check_diff ( target , data ) ; if ( error > max_error ) max_error = error ; // third test: do the same with the 'classic' remap routine which internally creates // a b-spline #ifdef PRINT_ELAPSED start = std::chrono::system_clock::now(); #endif for ( int times = 0 ; times < TIMES ; times++ ) vspline::remap ( data , fwarp , target , bcv , DEGREE ) ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x classic remap:........................ " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif error = check_diff ( target , data ) ; if ( error > max_error ) max_error = error ; // fourth test: perform an transform() directly using the b-spline evaluator // as the transform's functor. This is, yet again, the same, because // it evaluates at all discrete positions, but now without the warp array: // the index-based transform feeds the evaluator with the discrete coordinates. #ifdef PRINT_ELAPSED start = std::chrono::system_clock::now(); #endif for ( int times = 0 ; times < TIMES ; times++ ) vspline::transform ( ev , target ) ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x index-based transform................. " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif cout << "difference original data/restored data:" << endl ; error = check_diff ( target , data ) ; if ( error > max_error ) max_error = error ; // fifth test: use 'restore' which internally uses convolution. This is // usually slightly faster than the previous way to restore the original // data, but otherwise makes no difference. #ifdef PRINT_ELAPSED start = std::chrono::system_clock::now(); #endif for ( int times = 0 ; times < TIMES ; times++ ) vspline::restore < 2 , pixel_type > ( bsp , target ) ; #ifdef PRINT_ELAPSED end = std::chrono::system_clock::now(); cout << "avg " << TIMES << " x restore original data: .............. " << std::chrono::duration_cast(end - start).count() / float(TIMES) << " ms" << endl ; #endif cout << "difference original data/restored data:" << endl ; error = check_diff ( data , target ) ; if ( error > max_error ) max_error = error ; cout << endl ; return max_error ; } template < class real_type , class rc_type > long double process_image ( char * name ) { long double max_error = 0.0L ; long double error ; cout << fixed << showpoint << setprecision(16) ; // the import and info-displaying code is taken from vigra: vigra::ImageImportInfo imageInfo(name); // print some information std::cout << "Image information:\n"; std::cout << " file format: " << imageInfo.getFileType() << std::endl; std::cout << " width: " << imageInfo.width() << std::endl; std::cout << " height: " << imageInfo.height() << std::endl; std::cout << " pixel type: " << imageInfo.getPixelType() << std::endl; std::cout << " color image: "; if (imageInfo.isColor()) std::cout << "yes ("; else std::cout << "no ("; std::cout << "number of channels: " << imageInfo.numBands() << ")\n"; typedef vigra::RGBValue pixel_type; typedef vigra::MultiArray<2, pixel_type> array_type ; typedef vigra::MultiArrayView<2, pixel_type> view_type ; // to test that strided data are processed correctly, we load the image // to an inner subarray of containArray // array_type containArray(imageInfo.shape()+vigra::Shape2(3,5)); // view_type imageArray = containArray.subarray(vigra::Shape2(1,2),vigra::Shape2(-2,-3)) ; // alternatively, just use the same for both array_type containArray ( imageInfo.shape() ); view_type imageArray ( containArray ) ; vigra::importImage(imageInfo, imageArray); // test these boundary conditions: vspline::bc_code bcs[] = { vspline::MIRROR , vspline::REFLECT , vspline::NATURAL , vspline::PERIODIC } ; for ( int b = 0 ; b < 4 ; b++ ) { vspline::bc_code bc = bcs[b] ; for ( int spline_degree = 1 ; spline_degree < 8 ; spline_degree++ ) { #if defined USE_VC cout << "testing bc code " << vspline::bc_name[bc] << " spline degree " << spline_degree << " using Vc" << endl ; #else cout << "testing bc code " << vspline::bc_name[bc] << " spline degree " << spline_degree << " using SIMD emulation" << endl ; #endif if ( spline_degree == 0 ) { std::cout << "using specialized evaluator" << std::endl ; error = run_test < view_type , real_type , rc_type , 0 > ( imageArray , bc , spline_degree ) ; if ( error > max_error ) max_error = error ; std::cout << "using unspecialized evaluator" << std::endl ; error = run_test < view_type , real_type , rc_type , -1 > ( imageArray , bc , spline_degree ) ; if ( error > max_error ) max_error = error ; } else if ( spline_degree == 1 ) { std::cout << "using specialized evaluator" << std::endl ; error = run_test < view_type , real_type , rc_type , 1 > ( imageArray , bc , spline_degree ) ; if ( error > max_error ) max_error = error ; std::cout << "using unspecialized evaluator" << std::endl ; error = run_test < view_type , real_type , rc_type , -1 > ( imageArray , bc , spline_degree ) ; if ( error > max_error ) max_error = error ; } else { error = run_test < view_type , real_type , rc_type , -1 > ( imageArray , bc , spline_degree ) ; if ( error > max_error ) max_error = error ; } } } return max_error ; } int main ( int argc , char * argv[] ) { long double max_error = 0.0L ; long double error ; cout << "testing float data, float coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of float/float test: " << error << std::endl << std::endl ; cout << endl << "testing double data, double coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of double/double test: " << error << std::endl << std::endl ; cout << endl << "testing long double data, float coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of ldouble/float test: " << error << std::endl << std::endl ; cout << endl << "testing long double data, double coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of ldouble/double test: " << error << std::endl << std::endl ; cout << "testing float data, double coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of float/double test: " << error << std::endl << std::endl ; cout << endl << "testing double data, float coordinates" << endl ; error = process_image ( argv[1] ) ; if ( error > max_error ) max_error = error ; cout << "max error of double/float test: " << error << std::endl << std::endl ; cout << "reached end. max error of all tests: " << max_error << std::endl ; } kfj-vspline-4b365417c271/example/scope_test.cc000066400000000000000000000610201333775006700210240ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file scope_test.cc \brief tries to fathom vspline's scope vspline is very flexible when it comes to types and other compile-time issues. It's not really feasible to instantiate every single possible combination of compile-time parameters, but this program is an attempt to test a few relevant ones, specifically for vspline's 'higher' API, namely classes bspline and evaluator, and the functions in transform.h. We test data in 1, 2 and 3 dimensions. For individual values, we use plain fundamentals and TinyVectors of two and three fundamentals, with the fundamental types float, double and long double. Wherever possible, we perform the tests with several vectorization widths, to make sure that the wielding code performs as expected. For every such combination, we perform this test sequence: - produce random data and random coordinates - create a b-spline with given degree and boundary conditions from the data - create a 'safe evaluator' for the spline - use the evaluator to bulk-evaluate the random coordinates - and to evaluate them one-by-one; compare the result - restore the original data from the coefficients using an index-based transform and also using vspline::restore; compare the results. On my system, this is just about what the compiler can handle, and the compilation takes a long time. The test can be limited to smaller parameter sets, a good place to narrow the scope is by picking less rc_type/ele_type combinations in doit(). This test passes if - the code compiles at all - the program runs until completion - the errors are near the resolution you'd expect from the types involved, so you'd expect output like ... vsz +5 chn +2 trg_dim +3 cf_dim +3 float crd, float value d Mean: +0.000000763713326440d Max: +0.000007158763497749 d Mean: +0.000000757349428671d Max: +0.000007168434323489 double crd, double value d Mean: +0.000000000000009688d Max: +0.000000000000899253 d Mean: +0.000000000000009681d Max: +0.000000000000899253 long double crd, long double value d Mean: +0.000000000000000001d Max: +0.000000000000000006 d Mean: +0.000000000000000001d Max: +0.000000000000000006 ... This test does not focus on different spline degrees or boundary conditions, it's purpose is to test compile-time parametrization. But it can easily be modified to use different spline degrees and boundary conditions by passing the relevant parameters to test(); I have limited the choice to a few 'typical' combinations. Another aspect which is not tested is variation of the extents of the arrays of data involved. Other tests, like 'restore_test.cc' and 'roundtrip.cc' focus more on these run-time parameters. Testing varying vectorization widths is for completeness' sake, normally using the defaults should give good performance. */ #include #include #include #include #include #include #include bool verbose = true ; // false ; // 'condense' aggregate types (TinyVectors etc.) into a single value template < typename T > double condense ( const T & t , std::true_type ) { return std::abs ( t ) ; } template < typename T > double condense ( const T & t , std::false_type ) { return sqrt ( sum ( t * t ) ) / t.size() ; } template < typename T > double condense ( const std::complex & t , std::false_type ) { return std::abs ( t ) ; } template < class T > using is_singular = typename std::conditional < std::is_fundamental < T > :: value , std::true_type , std::false_type > :: type ; template < typename T > double condense ( const T & t ) { return condense ( t , is_singular() ) ; } // compare two arrays and calculate the mean and maximum difference // here we also take the 'value_extent' - the largest coefficient // value we have used - to put the error in relation, so that the high // coefficient values we use for integral coefficients don't produce // results which look wronger than they are ;) template < unsigned int dim , typename T > double check_diff ( vigra::MultiArrayView < dim , T > & reference , vigra::MultiArrayView < dim , T > & candidate , double value_extent ) { using namespace vigra::multi_math ; using namespace vigra::acc; assert ( reference.shape() == candidate.shape() ) ; vigra::MultiArray < 1 , double > error_array ( vigra::Shape1(reference.size() ) ) ; for ( int i = 0 ; i < reference.size() ; i++ ) { auto help = candidate[i] - reference[i] ; error_array [ i ] = condense ( help ) ; } AccumulatorChain < double , Select < Mean, Maximum > > ac ; extractFeatures ( error_array.begin() , error_array.end() , ac ) ; double mean = get(ac) ; double max = get(ac) ; if ( verbose ) { std::cout << "rel. error Mean: " << mean / value_extent << " rel. error Max: " << max / value_extent << std::endl; } return max ; } using namespace vspline ; #define ast(x) std::integral_constant // with this test routine, the idea is to test all of vspline's // higher functions with all possible compile time parameters. template < unsigned int cf_dim , // dimension of spline/coefficient array unsigned int trg_dim , // dimension of target (result) array int chn , // number of channels in spline's data type int vsz , // vectorization width typename rc_type , // elementary type of a real coordinate typename ele_type , // elementary type of coefficients typename math_ele_type > // elementary type used for arithmetics void test ( int spline_degree = 3 , vspline::bc_code bc = vspline::MIRROR ) { // TODO: when operating on integral values, vectorized and unvectorized // operation sometimes produces results differing by at most 1, hence // this expression for 'tolerance'. I'm not sure why this happens, it // looks like different results of rounding/truncation, should track // this down make sure the logic isn't flawed somewhere double tolerance = std::is_integral < ele_type > :: value ? 1.0 : 0.000001 ; // for integral coefficients, we use high knot point values to // provide sufficient dynamic range. the exact values are chosen // in a slightly ad-hoc manner, but they should demonstrate that // more bits can provide better results. double value_extent = std::is_integral < ele_type > :: value ? sizeof ( ele_type ) == 2 ? 1000.0 : sizeof ( ele_type ) == 4 ? 1000000.0 : sizeof ( ele_type ) == 8 ? 1000000000000.0 : 100.0 : 1.0 ; typedef typename vigra::MultiArrayShape::type cf_shape_t ; typedef typename vigra::MultiArrayShape::type trg_shape_t ; typedef typename std::conditional < chn == 1 , ele_type , vigra::TinyVector < ele_type , chn > > :: type dtype ; typedef typename std::conditional < cf_dim == 1 , rc_type , vigra::TinyVector < rc_type , cf_dim > > :: type crd_type ; // allocate storage for original data cf_shape_t cf_shape { 99 } ; vigra::MultiArray < cf_dim , dtype > original ( cf_shape ) ; vigra::MultiArray < cf_dim , dtype > restored ( cf_shape ) ; // and storage for coordinates and results trg_shape_t trg_shape { 101 } ; vigra::MultiArray < trg_dim , crd_type > coordinates ( trg_shape ) ; vigra::MultiArray < 1 , rc_type > crd_1d { 101 } ; vigra::MultiArray < trg_dim , dtype > target ( trg_shape ) ; vigra::MultiArray < trg_dim , double > d_target ( trg_shape ) ; cf_shape_t gcf_shape { 101 } ; vigra::MultiArray < cf_dim , dtype > ge_target ( gcf_shape ) ; vigra::MultiArray < cf_dim , dtype > ge_target_2 ( gcf_shape ) ; // produce random original data. We produce some very high values // here, since we want to use them also for testing processing of // integral data, where we want to exhaust the dynamic range of the // integral type, since we can't have postcomma digits. The results // for processing floats will therefore have errors in the order of // magnitude of 1, rather than the usual errors around 1e-5 for // float data, when the test data are small numbers in [0,1] std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( - value_extent , value_extent ) ; auto data_ele_view = original.expandElements ( 0 ) ; for ( auto & e : data_ele_view ) e = dis ( gen ) ; // produce random coordinates. Note that many of these coordinates // will be out-of-range. Thy are folded into the range by the // 'safe evaluator', so this feature is also tested. // we deliberately overshoot the defined range by more than the first // reflection to be sure the extrapolation works as intended. std::uniform_real_distribution<> crd_dis ( -300.0 , 300.0 ) ; auto crd_ele_view = coordinates.expandElements ( 0 ) ; for ( auto & e : crd_ele_view ) e = crd_dis ( gen ) ; // produce a set of 'safe' 1D coordinates to test grid_eval // grid_eval can't handle out-of-range coordinates, we have to // stay within the spline's safe range. std::uniform_real_distribution<> crd1_dis ( 98.0 ) ; for ( auto & e : crd_1d ) e = crd1_dis ( gen ) ; // passing 'naked' pointers to grid_eval is now deprecated: // rc_type * grid [ cf_dim ] ; // for ( int d = 0 ; d < cf_dim ; d++ ) // grid [ d ] = crd_1d.data() ; grid_spec < cf_dim , rc_type > grid ; for ( int d = 0 ; d < cf_dim ; d++ ) grid [ d ] = crd_1d ; // create a b-spline over the data, prefilter it and create // an evaluator. note how we pass rc_type to make_safe_evaluator. // we test with the same boundary conditions along all axes. vigra::TinyVector < vspline::bc_code , cf_dim > bcv ( bc ) ; typedef vspline::bspline < dtype , cf_dim > spline_type ; spline_type bspl ( cf_shape , spline_degree , bcv ) ; // we instantiate prefilter with the given vsz to make sure that // we can indeed pick arbitrary vectorization widths. bspl.template prefilter < dtype , math_ele_type , vsz > ( original ) ; // we instantiate ev with the given vsz to make sure that // we can indeed pick arbitrary vectorization widths enum { ev_vsz = vsz } ; auto ev = vspline::make_safe_evaluator < spline_type , rc_type , ev_vsz , math_ele_type > ( bspl ) ; // run an array-based transform with the coordinates vspline::transform ( ev , coordinates , target ) ; // check the result against single-value evaluation. // here we expect to see precisely equal results. // note that in this test, we can't know what the // correct result should be, we only make sure that // vectorized evaluation and unvectorized evaluation // produce near identical results. auto it = target.begin() ; for ( auto const & e : coordinates ) { // TODO when working on int coefficients, I don't get total equality assert ( condense ( *it - ev ( e ) ) <= tolerance ) ; ++it ; } // create a safe evaluator with specified target type 'double' // and process the coordinates in 'coordinates' with it. Check // single-eval results against the result of 'transform'. auto evd = vspline::make_safe_evaluator < spline_type , rc_type , ev_vsz , math_ele_type , double > ( bspl ) ; vspline::transform ( evd , coordinates , d_target ) ; auto itd = d_target.begin() ; for ( auto const & e : coordinates ) { assert ( condense ( *itd - evd ( e ) ) <= tolerance ) ; ++itd ; } // create an evaluator over the spline to pass it to grid_eval. // here we need a 'raw' evaluator, not the type make_safe_evaluator // or make_evaluator would produce. auto gev = vspline::evaluator < crd_type , dtype , ev_vsz , -1 , math_ele_type > ( bspl ) ; // call grid_eval. the result is written to ge_target. // also use gen_grid_eval to doublecheck that both functions // produce identical output. vspline::grid_eval ( grid , gev , ge_target ) ; vspline::gen_grid_eval ( grid , gev , ge_target_2 ) ; // Now we 'manually' iterate over the coordinates in the grid // and compare the result to the content of ge_target to make sure // grid_eval has worked correctly. crd_type c ; auto & cc = reinterpret_cast < vigra::TinyVector < rc_type , cf_dim > & > ( c ) ; vigra::MultiCoordinateIterator < cf_dim > mci ( gcf_shape ) ; auto itg2 = ge_target_2.begin() ; for ( auto & ref : ge_target ) { // fill in the coordinate for ( int d = 0 ; d < cf_dim ; d++ ) cc [ d ] = grid [ d ] [ (*mci) [ d ] ] ; // evaluate at the coordinate and compare to grid_eval's output. // we expect equality of results here, since the arithmetic // is near identical for both ways of generating the result assert ( ref == gev ( c ) ) ; assert ( ref == *itg2 ) ; ++ mci ; ++itg2 ; } // restore original data, first by using an index-based // transform, then by calling 'restore' (using convolution) // directly on the spline, producing the restored data in // the spline's core. // Note how when working on integral data, the index-based // transform produced wildly wrong results due to the fact // that the evaluation was done entirely in the integral type. // The new evaluation code can, if math_ele_type is real, // produce reasonable results which only suffer from // quantization errors, which was only possible with // restoration by convolution in the last version. vspline::transform ( ev , restored ) ; // we instantiate prefilter with the given vsz to make sure that // we can indeed pick arbitrary vectorization widths. vspline::restore < cf_dim , dtype , math_ele_type , vsz > ( bspl ) ; // compare the data obtained from the two restoration methods. // due to the slightly different arithmetic involved, we expect // similar, but nor necessarily identical results. check_diff ( original , restored , value_extent ) ; check_diff ( original , bspl.core , value_extent ) ; } template < typename vsz_t , typename chn_t , typename trg_dim_t , typename cf_dim_t , typename ... other_types > void doit() { enum { vsz = vsz_t::value } ; enum { chn = chn_t::value } ; enum { trg_dim = trg_dim_t::value } ; enum { cf_dim = cf_dim_t::value } ; std::cout << "vsz " << vsz << " chn " << chn << " trg_dim " << trg_dim << " cf_dim " << cf_dim << std::endl ; // we do a few exemplary runs - if we tried to exhaust all // possible combinations of data types, spline degrees and // boundary conditions, this would take forever, and we already // have code in restore_test exploring that way. Here we're // more interested in making sure that 1D and nD data and // real and integral types are processed as expected and that // all 'high-level' capabilities of vspline are invoked. // a very 'standard' scenario: cubic b-spline, all in float: std::cout << "float crd, float value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , float , float , float > ( 3 , vspline::MIRROR ) ; // testing an integral-valued spline. With int/short data type, // we can use vectorization. If we use long instead, vsize has // automatically falls back to 1 for evaluation. std::cout << "float crd, short value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , float , short , float > ( 2 , vspline::MIRROR ) ; std::cout << "float crd, int value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , float , int , float > ( 4 , vspline::PERIODIC ) ; // we can use long-valued coefficients, but evaluation for these // types won't be vectorized (in test() there is a fallback for // cases where ele_type can't be vectorized which affects all // evaluations. vectorization is impossible here because // batches of coefficients are loaded using SIMD operations). // TODO might code fallback to goading std::cout << "double crd, long value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , float , long , double > ( 3 , vspline::PERIODIC ) ; // all data in double. This will be vectorized, so it's quite // fast, and yet very precise. std::cout << "double crd, double value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , double , double , double > ( 5 , vspline::REFLECT ) ; // finally we test with all data in long double. This won't // be vectorized at all, but it's extremely precise. std::cout << "long double crd, long double value " << std::endl ; test < cf_dim , trg_dim , chn , vsz , long double , long double , long double > ( 3 , vspline::NATURAL ) ; } // when choosing vsize, the vectorization width, we don't go through // all values from 1 to 32, but only pick a few representative ones. // most of the time, vsize won't be picked 'manually' - and vectorization // widths which aren't a multiple of the hardware vector size are // quite futile, but here the point is to see if the code still // performs correctly even with 'odd' vsize values. template < typename ... other_types > void set_vsz ( int vsz ) { switch ( vsz ) { case 1: doit < ast(1) , other_types ... >() ; break ; // case 2: // doit < ast(2) , other_types ... >() ; // break ; case 3: doit < ast(3) , other_types ... >() ; break ; // case 4: // doit < ast(4) , other_types ... >() ; // break ; // case 5: // doit < ast(5) , other_types ... >() ; // break ; // case 6: // doit < ast(6) , other_types ... >() ; // break ; // case 7: // doit < ast(7) , other_types ... >() ; // break ; // case 8: // doit < ast(8) , other_types ... >() ; // break ; // case 9: // doit < ast(9) , other_types ... >() ; // break ; // case 10: // doit < ast(10) , other_types ... >() ; // break ; // case 11: // doit < ast(11) , other_types ... >() ; // break ; // case 12: // doit < ast(12) , other_types ... >() ; // break ; // case 13: // doit < ast(13) , other_types ... >() ; // break ; // case 14: // doit < ast(14) , other_types ... >() ; // break ; // case 15: // doit < ast(15) , other_types ... >() ; // break ; case 16: doit < ast(16) , other_types ... >() ; break ; // case 17: // doit < ast(17) , other_types ... >() ; // break ; // case 18: // doit < ast(18) , other_types ... >() ; // break ; // case 19: // doit < ast(19) , other_types ... >() ; // break ; // case 20: // doit < ast(20) , other_types ... >() ; // break ; // case 21: // doit < ast(21) , other_types ... >() ; // break ; // case 22: // doit < ast(22) , other_types ... >() ; // break ; // case 23: // doit < ast(23) , other_types ... >() ; // break ; // case 24: // doit < ast(24) , other_types ... >() ; // break ; // case 25: // doit < ast(25) , other_types ... >() ; // break ; // case 26: // doit < ast(26) , other_types ... >() ; // break ; // case 27: // doit < ast(27) , other_types ... >() ; // break ; // case 28: // doit < ast(28) , other_types ... >() ; // break ; // case 29: // doit < ast(29) , other_types ... >() ; // break ; // case 30: // doit < ast(30) , other_types ... >() ; // break ; // case 31: // doit < ast(31) , other_types ... >() ; // break ; // case 32: // doit < ast(32) , other_types ... >() ; // break ; default: break ; } } template < typename ... other_types > void set_chn ( int chn , int vsz ) { switch ( chn ) { case 1: set_vsz < ast(1) , other_types ... > ( vsz ) ; break ; case 2: set_vsz < ast(2) , other_types ... > ( vsz ) ; break ; // case 3: // set_vsz < ast(3) , other_types ... > ( vsz ) ; // break ; default: break ; } } template < typename ... other_types > void set_trg_dim ( int trg_dim , int chn , int vsz ) { switch ( trg_dim ) { case 1: set_chn < ast(1) , other_types ... > ( chn , vsz ) ; break ; case 2: set_chn < ast(2) , other_types ... > ( chn , vsz ) ; break ; // case 3: // set_chn < ast(3) , other_types ... > ( chn , vsz ) ; // break ; default: break ; } } template < typename ... other_types > void set_cf_dim ( int cf_dim , int trg_dim , int chn , int vsz ) { switch ( cf_dim ) { case 1: set_trg_dim < ast(1) , other_types ... > ( trg_dim , chn , vsz ) ; break ; case 2: set_trg_dim < ast(2) , other_types ... > ( trg_dim , chn , vsz ) ; break ; // case 3: // set_trg_dim < ast(3) , other_types ... > ( trg_dim , chn , vsz ) ; // break ; default: break ; } } int main ( int argc , char * argv[] ) { std::cout << std::fixed << std::showpos << std::showpoint << std::setprecision(18); std::cerr << std::fixed << std::showpos << std::showpoint << std::setprecision(18); for ( int cf_dim = 1 ; cf_dim <= 2 ; cf_dim++ ) { for ( int trg_dim = 1 ; trg_dim <= 2 ; trg_dim++ ) { for ( int chn = 1 ; chn <= 2 ; chn++ ) { // for ( int vsz = 1 ; vsz <= 32 ; vsz++ ) // { // set_cf_dim ( cf_dim , trg_dim , chn , vsz ) ; // } set_cf_dim ( cf_dim , trg_dim , chn , 1 ) ; set_cf_dim ( cf_dim , trg_dim , chn , 3 ) ; set_cf_dim ( cf_dim , trg_dim , chn , 16 ) ; } } } std::cout << "terminating" << std::endl ; } kfj-vspline-4b365417c271/example/self_test.cc000066400000000000000000000424321333775006700206520ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2017 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file self_test.cc /// /// \brief test consistency of precomputed data, prefiltering and evaluation /// /// the self-test uses three entities: a unit pulse, the reconstruction /// kernel (which is a unit-spaced sampling of the b-spline basis function's /// values at integer arguments) and the prefilter. These data have a few /// fundamental relations we can test: /// - prefiltering the reconstruction kernel results in a unit pulse /// - producing a unit-spaced sampling of a spline with only one single /// unit-valued coefficient produces the reconstruction kernel /// /// Performing the tests also assures us that the evaluation machinery with /// it's 'weight matrix' does what's intended, and that access to the basis /// function and it's derivatives (see basis_functor) functions correctly. /// /// With rising spline degree, the test is ever more demanding. this is /// reflected in the maximum error returned for every degree: it rises /// with the spline degree. With the complex operations involved, seeing /// a maximal error in the order of magnitude of 1e-12 for working with /// long doubles seems reasonable enough (on my system). /// /// I assume that the loss of precision with high degrees is mainly due to /// the filter's overall gain becoming very large. Since the gain is /// applied as a factor before or during prefiltering and prefiltering /// has the reverse effect, in the sum we end up having the effect of first /// multiplying with, then dividing by a very large number, 'crushing' /// the values to less precision. In bootstrap.cc, I perfom the test /// with GMP high precision floats and there I can avoid the problem, since /// the magnitude of the numbers I use there is well beyond the magnitude /// of the gain occuring with high spline degrees. So the conclusion is that /// high spline degrees can be used, but may not produce very precise results, /// since the normal C++ types are hard pressed to cope with the dynamic /// range covered by the filter. /// /// The most time is spent calculating the basis function values recursively /// using cdb_bspline_basis, for cross-reference. /// /// compile: clang++ -O3 -std=c++11 self_test.cc -o self_test -pthread #include #include #include #include #include #include #include #include #include long double circular_test_previous ; template < typename dtype > dtype self_test ( int degree , dtype threshold , dtype strict_threshold ) { if ( degree == 0 ) circular_test_previous = 1 ; dtype max_error = 0 ; // self-test for plausibility. we know that using the b-spline // prefilter of given degree on a set of unit-spaced samples of // the basis function (at 0, +/-k) should yield a unit pulse. // we create a bspline object for 100 core coefficients typedef vspline::bspline < dtype , 1 > spline_type ; spline_type bspl ( 100 , degree ) ; // next we create an evaluator for this spline. // Note how, to be as precise as possible for this test, // we specify 'double' as elementary type for coordinates. // This is overkill, but so what... auto ev = vspline::make_evaluator < spline_type , double > ( bspl ) ; // and two arrays with the same size as the spline's 'core'. vigra::MultiArray<1,dtype> result ( 100 ) ; vigra::MultiArray<1,dtype> reference ( 100 ) ; // we obtain a pointer to the reference array's center dtype * p_center = &(reference[50]) ; // we can obtain the reconstruction kernel by accessing precomputed // basis function values via bspline_basis_2() for ( int x = - degree / 2 ; x <= degree / 2 ; x++ ) { p_center[x] = vspline::bspline_basis_2 ( x+x , degree ) ; } // alternatively we can put a unit pulse into the center of the // coefficient array, transform and assign back. Transforming // the unit pulse produces the reconstruction kernel. Doing so // additionally assures us that the evaluation machinery with // it's 'weight matrix' is functioning correctly. // we obtain a pointer to the coefficient array's center p_center = &(bspl.core[50]) ; *p_center = 1 ; vspline::transform ( ev , result ) ; bspl.core = result ; // we compare the two versions of the reconstruction kernel we // have to make sure they agree. the data should be identical. // we also compare with the result of a vspline::basis_functor, // which uses the same method of evaluating a b-spline with a // single unit-valued coefficient. // Here we expect complete agreement. vspline::basis_functor bf ( degree ) ; for ( int x = 50 - degree / 2 ; x <= 50 + degree / 2 ; x++ ) { assert ( result[x] == reference[x] ) ; assert ( bf ( x - 50 ) == reference[x] ) ; } // now we apply the prefilter, expecting that afterwards we have // a single unit pulse coinciding with the location where we // put the center of the kernel. This test will exceed the strict // threshold, but the ordinary one will hold. bspl.prefilter() ; // we test our predicate for ( int x = - degree / 2 ; x <= degree / 2 ; x++ ) { dtype error ; if ( x == 0 ) { // at the origin we expect a value of 1.0 error = std::fabs ( p_center [ x ] - 1.0 ) ; } else { // off the origin we expect a value of 0.0 error = std::fabs ( p_center [ x ] ) ; } if ( error > threshold ) std::cout << "unit pulse test, x = " << x << ", error = " << error << std::endl ; max_error = std::max ( max_error , error ) ; } // test bspline_basis() at k/2, k E N against precomputed values // while bspline_basis at whole arguments has delta == 0 and hence // makes no use of rows beyond row 0 of the weight matrix, arguments // at X.5 use all these rows. We can test against bspline_basis_2, // which provides precomputed values for half unit steps. // we run this test with strict_threshold. int xmin = - degree - 1 ; int xmax = degree + 1 ; for ( int x2 = xmin ; x2 <= xmax ; x2++ ) { auto a = bf ( x2 / 2.0L ) ; auto b = vspline::bspline_basis_2 ( x2 , degree ) ; auto error = std::abs ( a - b ) ; if ( error > strict_threshold ) std::cout << "bfx2: " << x2 / 2.0 << " : " << a << " <--> " << b << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } // set all coefficients to 1, evaluate. result should be 1, // because every set of weights is a partition of unity // this is a nice test, because it requires participation of // all rows in the weight matrix, since the random arguments // produce arbitrary delta. we run this test with strict_threshold. { std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( 50 - degree -1 , 50 + degree + 1 ) ; bspl.container = 1 ; for ( int k = 0 ; k < 1000 ; k++ ) { double x = dis ( gen ) ; dtype y = ev ( x ) ; dtype error = std::abs ( y - 1 ) ; if ( error > strict_threshold ) std::cout << "partition of unity test, d0: " << x << " : " << y << " <--> " << 1 << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } vigra::TinyVector < int , 1 > deriv_spec ; // we also test evaluation of derivatives up to 2. // Here, with the coefficients all equal, we expect 0 as a result. if ( degree > 1 ) { deriv_spec[0] = 1 ; auto dev = vspline::make_evaluator < spline_type , double > ( bspl , deriv_spec ) ; for ( int k = 0 ; k < 1000 ; k++ ) { double x = dis ( gen ) ; dtype y = dev ( x ) ; dtype error = std::abs ( y ) ; if ( error > strict_threshold ) std::cout << "partition of unity test, d1: " << x << " : " << y << " <--> " << 0 << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } } if ( degree > 2 ) { deriv_spec[0] = 2 ; auto ddev = vspline::make_evaluator< spline_type , double > ( bspl , deriv_spec ) ; for ( int k = 0 ; k < 1000 ; k++ ) { double x = dis ( gen ) ; dtype y = ddev ( x ) ; dtype error = std::abs ( y ) ; if ( error > strict_threshold ) std::cout << "partition of unity test, d2: " << x << " : " << y << " <--> " << 0 << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } } } // for smaller degrees, the cdb recursion is usable, so we can // doublecheck basis_functor for a few sample x. The results here // are also very precise, so we use strict_threshold. Initially // I took this up to degree 19, but now it's only up to 13, which // should be convincing enough and is much faster. if ( degree < 13 ) // was: 19 { std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( - degree -1 , degree + 1 ) ; for ( int k = 0 ; k < 1000 ; k++ ) { dtype x = dis ( gen ) ; dtype a = bf ( x ) ; dtype b = vspline::cdb_bspline_basis ( x , degree ) ; dtype error = std::abs ( a - b ) ; if ( error > strict_threshold ) std::cout << "bf: " << x << " : " << a << " <--> " << b << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } } if ( degree > 1 && degree < 13 ) { vspline::basis_functor dbf ( degree , 1 ) ; std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( - degree -1 , degree + 1 ) ; for ( int k = 0 ; k < 1000 ; k++ ) { dtype x = dis ( gen ) ; dtype a = dbf ( x ) ; dtype b = vspline::cdb_bspline_basis ( x , degree , 1 ) ; dtype error = std::abs ( a - b ) ; if ( error > strict_threshold ) std::cout << "dbf: " << x << " : " << a << " <--> " << b << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } } if ( degree > 2 && degree < 13 ) { vspline::basis_functor ddbf ( degree , 2 ) ; std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( - degree -1 , degree + 1 ) ; for ( int k = 0 ; k < 1000 ; k++ ) { dtype x = dis ( gen ) ; dtype a = ddbf ( x ) ; dtype b = vspline::cdb_bspline_basis ( x , degree , 2 ) ; dtype error = std::abs ( a - b ) ; if ( error > strict_threshold ) std::cout << "ddbf: " << x << " : " << a << " <--> " << b << " error " << error << std::endl << std::endl ; max_error = std::max ( max_error , error ) ; } } if ( degree > 0 ) { std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( -1 , 1 ) ; dtype circle_error = 0 ; dtype avg_circle_error = 0 ; // consider a spline with a single 1.0 coefficient at the origin // reference is the spline's value at ( 1 , 0 ), which is // certainly on the unit circle dtype v2 = bf ( 1 ) * bf ( 0 ) ; // let's assume 10000 evaluations is a good enough // statistical base for ( int k = 0 ; k < 10000 ; k++ ) { // take a random x and y coordinate double x = dis ( gen ) ; double y = dis ( gen ) ; // normalize to unit circle double s = sqrt ( x * x + y * y ) ; x /= s ; y /= s ; // and take the value of the spline there, which is // the product of the basis function values dtype v1 = bf ( x ) * bf ( y ) ; // we assume that with rising spline degree, the difference // between these two values will become evere smaller, as the // equipotential lines of the splines become more and // more circular dtype error = std::abs ( v1 - v2 ) ; circle_error = std::max ( circle_error , error ) ; avg_circle_error += error ; } assert ( circle_error < circular_test_previous ) ; circular_test_previous = circle_error ; // in my tests, circle_error goes down to ca 7.4e-7, // so with degree 45 evaluations on the unit circle // differ very little from each other. // std::cout << "unit circle test, degree " << degree // << " emax = " << circle_error // << " avg(e) = " << avg_circle_error / 10000 // << " value: " << v2 << std::endl ; } // std::cout << "max error for degree " << degree // << ": " << max_error << std::endl ; // return max_error ; } // run a self test of vspline's constants, prefiltering and evaluation. // This tests if a set of common operations produces larger errors than // anticipated, to alert us if something has gone amiss. // The thresholds are fixed heuristically to be quite close to the actually // occuring maximum error. int main ( int argc , char * argv[] ) { long double max_error_l = 0 ; for ( int degree = 0 ; degree <= vspline_constants::max_degree ; degree++ ) { max_error_l = std::max ( max_error_l , self_test ( degree , 4e-13l , 1e-18 ) ) ; } std::cout << "maximal error of tests with long double precision: " << max_error_l << std::endl ; double max_error_d = 0 ; for ( int degree = 0 ; degree <= vspline_constants::max_degree ; degree++ ) { max_error_d = std::max ( max_error_d , self_test ( degree , 1e-9 , 7e-16 ) ) ; } std::cout << "maximal error of tests with double precision: " << max_error_d << std::endl ; float max_error_f = 0 ; // test float only up to degree 15. for ( int degree = 0 ; degree < 16 ; degree++ ) { max_error_f = std::max ( max_error_f , self_test ( degree , 3e-6 , 4e-7 ) ) ; } std::cout << "maximal error of tests with float precision: " << max_error_f << std::endl ; if ( max_error_l < 4e-13 && max_error_d < 1e9 && max_error_f < 3e-6 ) std::cout << "test passed" << std::endl ; else std::cout << "test failed" << std::endl ; } kfj-vspline-4b365417c271/example/slice.cc000066400000000000000000000126201333775006700177550ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file slice.cc /// /// \brief create 2D image data from a 3D spline /// /// build a 3D volume from samples of the RGB colour space /// build a spline over it and extract a 2D slice, using vspline::transform() /// In this example, we use an 'array-based' transform, where the coordinates /// at which the spline is to be evaluated are held in an array of the same /// extent as the target. /// /// compile with: /// clang++ -std=c++11 -march=native -o slice -O3 -pthread -DUSE_VC=1 slice.cc -lvigraimpex -lVc /// or: clang++ -std=c++11 -march=native -o slice -O3 -pthread slice.cc -lvigraimpex /// g++ also works. #include #include #include #include #include int main ( int argc , char * argv[] ) { // pixel_type is the result type, an RGB float pixel typedef vigra::TinyVector < float , 3 > pixel_type ; // voxel_type is the source data type - the same as pixel_type typedef vigra::TinyVector < float , 3 > voxel_type ; // coordinate_3d has a 3D coordinate typedef vigra::TinyVector < float , 3 > coordinate_3d ; // warp_type is a 2D array of coordinates typedef vigra::MultiArray < 2 , coordinate_3d > warp_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ; // create quintic 3D b-spline object containing voxels vspline::bspline < voxel_type , 3 > space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ; // fill the b-spline's core with a three-way gradient for ( int z = 0 ; z < 10 ; z++ ) { for ( int y = 0 ; y < 10 ; y++ ) { for ( int x = 0 ; x < 10 ; x++ ) { voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ; c[0] = 25.5 * x ; c[1] = 25.5 * y ; c[2] = 25.5 * z ; } } } // prefilter the b-spline space.prefilter() ; // get an evaluator for the b-spline typedef vspline::evaluator < coordinate_3d , voxel_type > ev_type ; ev_type ev ( space ) ; // now make a 'warp' array with 1920X1080 3D coordinates warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ; // we want the coordinates to follow this scheme: // warp(x,y) = (x,1-x,y) // scaled appropriately for ( int y = 0 ; y < 1080 ; y++ ) { for ( int x = 0 ; x < 1920 ; x++ ) { coordinate_3d & c ( warp [ vigra::Shape2 ( x , y ) ] ) ; c[0] = float ( x ) / 192.0 ; c[1] = 10.0 - c[0] ; c[2] = float ( y ) / 108.0 ; } } // this is where the result should go: target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ; // now we perform the transform, yielding the result vspline::transform ( ev , warp , target ) ; // store the result with vigra impex vigra::ImageExportInfo imageInfo ( "slice.tif" ); vigra::exportImage ( target , imageInfo .setPixelType("UINT8") .setCompression("100") .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ; std::cout << "result was written to slice.tif" << std::endl ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/slice2.cc000066400000000000000000000211711333775006700200400ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file slice2.cc /// /// \brief create 2D image data from a 3D spline /// /// build a 3D volume from samples of the RGB colour colour_space /// build a spline over it and extract a 2D slice, using vspline::transform() /// /// while the result is just about the same as the one we get from slice.cc, /// here we use additional functors to create the colour gradient and do the /// coordinate transformation. /// /// compile with: /// clang++ -std=c++11 -march=native -o slice2 -O3 -pthread -DUSE_VC=1 slice2.cc -lvigraimpex -lVc /// or: clang++ -std=c++11 -march=native -o slice2 -O3 -pthread slice2.cc -lvigraimpex /// g++ also works. #include #include #include #include #include // pixel_type is the result type, an RGB float pixel typedef vigra::RGBValue < unsigned char , 0 , 1 , 2 > pixel_type ; // voxel_type is the source data type, here we're using double precision typedef vigra::TinyVector < double , 3 > voxel_type ; // coordinate2_type has a 2D coordinate typedef vigra::TinyVector < float , 2 > coordinate2_type ; // coordinate3_type has a 3D coordinate typedef vigra::TinyVector < float , 3 > coordinate3_type ; // target_type is a 2D array of pixels typedef vigra::MultiArray < 2 , pixel_type > target_type ; // we'll use a common vectorization width of 8 throughout enum { VSIZE = 8 } ; // we'll use a functor to create the gradient in the b-spline struct calculate_gradient_type : public vspline::unary_functor < coordinate3_type , voxel_type , VSIZE > { // this method generates a voxel from a 3D coordinate. template < class IN , class OUT > void eval ( const IN & c , OUT & result ) const { // assign input to output and scale result = c ; result *= 25.5 ; // because we don't have the relevant vigra numeric and promote traits, // we *can't* write the obvious // result = 25.5 * c ; } ; } ; // type of b-spline evaluator producing pixels from 3D coordinates // here we pass all template arguments an evaluator can take: // - coordinate3_type for the type of incoming 3D coordinates // - pixel_type for the data type we want to receive as result // - VSIZE for the vectorization width // - -1 indicates we want unspecialized b-spline evaluation // - double: we want internal calculations done in double precision // - voxel_type will be the type of coefficients held in the spline typedef vspline::evaluator < coordinate3_type , pixel_type , VSIZE , -1 , double , voxel_type > ev_type ; // this functor is used for the coordinate transformation. It receives // 2D coordinates (discrete target coordinates) and produces 3D // coordinates which will be used to evaluate the spline. We pass // the vectorization width explicitly to make sure it's the same // as that used by the evaluator; if this weren't the case we could // not 'chain' them further down struct calculate_pickup_type : public vspline::unary_functor < coordinate2_type , coordinate3_type , VSIZE > { // this method transforms incoming discrete 2D coordinates - coordinates // pertaining to pixels in the target image - into 3D 'pick-up' coordinates // at which to evaluate the spline. Note how it's written as a template, // since the code for unvectorized and vectorized evaluation is just // the same. Note how we have coded coordinate2_type as consisting of // two floats, rather than two ints, which would have been just as well, // but which would have required transforming 'c' to it's floating point // equivalent before doing the maths. The 'index-based' version of // vspline::transform will feed the functor with the type it expects as // it's incoming type, so coordinate2_type in this case - or it's // vectorized equivalent. template < class IN , class OUT > void eval ( const IN & c , OUT & result ) const { result[0] = c[0] / 192.0f ; result[1] = 10.0f - result[0] ; result[2] = c[1] / 108.0f ; } ; } ; int main ( int argc , char * argv[] ) { // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ; // create quintic 3D b-spline object containing voxels // note the shape of the spline: it's ten units wide in each direction. // this explains the factor 25.5 used to calculate the voxels it holds: // the voxel's values will go from 0 to 255 for each channel vspline::bspline < voxel_type , 3 > colour_space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ; // this functor will calculate the colour cube's content: calculate_gradient_type gradient ; // we could instead use vspline's 'amplify_type' to the same effect: // vspline::amplify_type < voxel_type , voxel_type , voxel_type , VSIZE > // gradient ( voxel_type ( 25.5 ) ) ; // now we run an index-based transform on the spline's 'core'. This will // feed successive 3D coordinates to 'gradient', which will calculate // voxel values from them vspline::transform ( gradient , colour_space.core ) ; // prefilter the b-spline colour_space.prefilter() ; // create the coordinate transformation functor calculate_pickup_type pick ; // get an evaluator for the b-spline ev_type ev ( colour_space ) ; // 'chain' the coordinate transformation functor and the evaluator auto combined = vspline::chain ( pick , ev ) ; // this is where the result should go: target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ; // now we perform the transform, yielding the result // note how we use a 'index-based' transform feeding the functor // (combined) with discrete target coordinates. Inside 'combined', // the incoming discrete coordinate is first transformed to the // 'pick-up' coordinate, which is in turn used to evaluate the // spline, yielding the result, which is stored in 'target'. vspline::transform ( combined , target ) ; // store the result with vigra impex vigra::ImageExportInfo imageInfo ( "slice.tif" ); vigra::exportImage ( target , imageInfo .setPixelType("UINT8") .setCompression("100") .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ; std::cout << "result was written to slice.tif" << std::endl ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/slice3.cc000066400000000000000000000121201333775006700200330ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file slice3.cc /// /// \brief create 2D image data from a 3D spline /// /// build a 3D volume from samples of the RGB colour space /// build a spline over it and extract a 2D slice /// /// Here we use a quick shot solution. /// Oftentimes all it takes is a single run of an interpolation with /// as little programming effort as possible, never mind the performance. /// Again we use a b-spline with double-precision voxels as value_type, /// but instead of using vspline::transform, we simply run the calculations /// in loops. /// /// compile with: /// clang++ -std=c++11 -o slice3 -O3 -pthread slice3.cc -lvigraimpex /// g++ also works. #include #include #include #include #include // voxel_type is the source data type typedef vigra::RGBValue < double , 0 , 1 , 2 > voxel_type ; // pixel_type is the result type, here we use a vigra::RGBValue for a change typedef vigra::RGBValue < unsigned char , 0 , 1 , 2 > pixel_type ; // coordinate_type has a 3D coordinate typedef vigra::TinyVector < float , 3 > coordinate_type ; int main ( int argc , char * argv[] ) { // we want a b-spline with natural boundary conditions vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ; // create quintic 3D b-spline object containing voxels vspline::bspline < voxel_type , 3 > bspl ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ; // fill the b-spline's core with a three-way gradient for ( int z = 0 ; z < 10 ; z++ ) { for ( int y = 0 ; y < 10 ; y++ ) { for ( int x = 0 ; x < 10 ; x++ ) { bspl.core ( x , y , z ) = voxel_type ( 25.5 * x , 25.5 * y , 25.5 * z ) ; } } } // prefilter the b-spline bspl.prefilter() ; // get an evaluator for the b-spline auto ev = vspline::make_evaluator ( bspl ) ; // this is where the result should go: vigra::MultiArray < 2 , pixel_type > target ( vigra::Shape2 ( 1920 , 1080 ) ) ; // we want the pick-up coordinates to follow this scheme: // pick ( x , y ) = ( x , 1 - x , y ) // scaled appropriately for ( int y = 0 ; y < 1080 ; y++ ) { for ( int x = 0 ; x < 1920 ; x++ ) { // calculate the pick-up coordinate coordinate_type pick { x / 192.0f , 10.0f - x / 192.0f , y / 108.0f } ; // call the evaluator and store the result to 'target' target ( x , y ) = ev ( pick ) ; } } // store the result with vigra impex vigra::ImageExportInfo imageInfo ( "slice.tif" ); vigra::exportImage ( target , imageInfo .setPixelType("UINT8") .setCompression("100") .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ; std::cout << "result was written to slice.tif" << std::endl ; exit ( 0 ) ; } kfj-vspline-4b365417c271/example/splinus.cc000066400000000000000000000203671333775006700203620ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file splinus.cc /// /// \brief compare a periodic b-spline with a sine /// /// This is a simple example using a periodic b-spline /// over just two values: 1 and -1. This spline is used to approximate /// a sine function. You pass the spline's desired degree on the command /// line. Next you enter a number (interpreted as degrees) and the program /// will output the sine and the 'splinus' of the given angle. /// As you can see when playing with higher degrees, the higher the spline's /// degree, the closer the match with the sine. So apart from serving as a /// very simple demonstration of using a 1D periodic b-spline, it teaches us /// that a periodic b-spline can approximate a sine. /// To show off, we use long double as the spline's data type. /// This program also shows a bit of functional programming magic, putting /// together the 'splinus' functor from several of vspline's functional /// bulding blocks. /// /// compile with: clang++ -pthread -O3 -std=c++11 splinus.cc -o splinus #include #include #include int main ( int argc , char * argv[] ) { if ( argc < 2 ) { std::cerr << "please pass the spline degree on the command line" << std::endl ; exit ( 1 ) ; } int degree = std::atoi ( argv[1] ) ; if ( degree < 4 ) { std::cout << "rising degree to 4, the minimum for this program" << std::endl ; degree = 4 ; } assert ( degree >= 4 && degree <= vspline_constants::max_degree ) ; // create the bspline object typedef vspline::bspline < long double , 1 > spline_type ; spline_type bsp ( 2 , // two values degree , // degree as per command line vspline::PERIODIC , // periodic boundary conditions 0.0 ) ; // no tolerance // the bspline object's 'core' is a MultiArrayView to the knot point // data, which we set one by one for this simple example: bsp.core[0] = 1.0L ; bsp.core[1] = -1.0L ; // now we prefilter the data bsp.prefilter() ; // we build 'splinus' as a functional construct. Inside the brace, // we 'chain' several vspline::unary_functors: // - a 'domain' which scales and shifts input to the spline's range. // with the 'incoming' range of [ 90 , 450 ] and the spline's // range of [ 0 , 2 ], the translation of incoming coordinates is: // x' = 0 + 2 * ( x - 90 ) / ( 450 - 90 ) // - a 'safe' evaluator for the spline. since the spline has been // built with PERIODIC boundary conditions, this evaluator will // map incoming coordinates into the first period, [0,2]. auto splinus = ( vspline::domain ( bsp , 90.0L , 450.0L ) + vspline::make_safe_evaluator < spline_type , long double > ( bsp ) ) ; // alternatively we can use this construct. This will work just about // the same, but has a potential flaw: If arithmetic imprecision should // land the output of the periodic gate just slightly below 90.0, the // domain may produce a value just below 0.0, resulting in a slightly // out-of-bounds access. So the construct above is preferable. // Just to demonstrate that vspline::grok also produces an object that // can be used with function call syntax, we use vspline::grok here. // Note how 'grokking' the chain of functors produces a simply typed // object, rather than the complexly typed result of the chaining // operation inside the brace: vspline::grok_type < long double , long double > splinus2 = vspline::grok ( vspline::periodic ( 90.0L , 450.0L ) + vspline::domain ( bsp , 90.0L , 450.0L ) + vspline::evaluator < long double , long double > ( bsp ) ) ; // we throw derivatives in the mix. If our spline models a sine, // it's derivatives should model the sine's derivatives, cos etc. // note how this is obscured by the higher 'steepness' of the spline, // which is over [ 0 , 2 ] , not [ 0 , 2 * pi ]. Hence the derivatives // come out amplified with a power of pi, which we compensate for. vigra::TinyVector < int , 1 > derivative ; derivative[0] = 1 ; vspline::evaluator < long double , long double > evd1 ( bsp , derivative ) ; derivative[0] = 2 ; vspline::evaluator < long double , long double > evd2 ( bsp , derivative ) ; derivative[0] = 3 ; vspline::evaluator < long double , long double > evd3 ( bsp , derivative ) ; auto splinus_d1 = ( vspline::domain ( bsp , 90.0L , 450.0L ) + vspline::periodic ( 0.0L , 2.0L ) + evd1 ) ; auto splinus_d2 = ( vspline::domain ( bsp , 90.0L , 450.0L ) + vspline::periodic ( 0.0L , 2.0L ) + evd2 ) ; auto splinus_d3 = ( vspline::domain ( bsp , 90.0L , 450.0L ) + vspline::periodic ( 0.0L , 2.0L ) + evd3 ) ; // now we let the user enter arbitrary angles, calculate the sine // and the 'splinus' and print the result and difference: while ( true ) { std::cout << " > " ; long double x ; std::cin >> x ; // get an angle long double xs = x * M_PI / 180.0L ; // note: sin() uses radians // finally we can produce both results. Note how we can use periodic_ev, // the combination of gate and evaluator, like an ordinary function. std::cout << "sin(" << x << ") = " << sin ( xs ) << std::endl << "cos(" << x << ") = " << cos ( xs ) << std::endl << "splinus(" << x << ") = " << splinus ( x ) << std::endl << "splinus2(" << x << ") = " << splinus2 ( x ) << std::endl << "difference sin/splinus: " << sin ( xs ) - splinus ( x ) << std::endl << "difference sin/splinus2: " << sin ( xs ) - splinus2 ( x ) << std::endl << "difference splinus/splinus2: " << splinus2 ( x ) - splinus ( x ) << std::endl << "derivatives: " << splinus_d1 ( x ) / M_PI << " " << splinus_d2 ( x ) / ( M_PI * M_PI ) << " " << splinus_d3 ( x ) / ( M_PI * M_PI * M_PI ) << std::endl ; } } kfj-vspline-4b365417c271/example/use_map.cc000066400000000000000000000111731333775006700203110ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file use_map.cc \brief test program for code in map.h The program creates one gate type each of the types provided in map.h over the interval [0,1]. Then it queries a value and uses the gates on this value in turn, printing out the result. */ #include #include #include #include const int VSIZE = vspline::vector_traits < double > :: size ; template < class gate_type > void test ( gate_type gx , double x , const char * mode ) { std::cout << mode << std::endl ; auto tester = vspline::mapper < typename gate_type::in_type > ( gx ) ; typedef double crd_type ; const crd_type crd { x } ; crd_type res ; tester.eval ( crd , res ) ; std::cout << "single-value operation:" << std::endl ; std::cout << crd << " -> " << res << std::endl ; if ( VSIZE > 1 ) { typedef vspline::vector_traits < crd_type , VSIZE > :: ele_v crd_v ; crd_v inv ( crd ) ; crd_v resv ; tester.eval ( inv , resv ) ; std::cout << "vectorized operation:" << std::endl << inv << " -> " << resv << std::endl ; } } int main ( int argc , char * argv[] ) { double x ; std::cout << std::fixed << std::showpos << std::showpoint << std::setprecision(5); while ( true ) { std::cout << "enter coordinate to map to [ 0.0 : 1.0 ]" << std::endl ; std::cin >> x ; try { test ( vspline::reject ( 0.0 , 1.0 ) , x , "REJECT:" ) ; } catch ( vspline::out_of_bounds ) { std::cout << "exception out_of_bounds" << std::endl ; } test ( vspline::clamp ( 0.0 , 1.0 , 0.0 , 1.0 ) , x , "CLAMP:" ) ; test ( vspline::mirror ( 0.0 , 1.0 ) , x , "MIRROR:" ) ; test ( vspline::periodic ( 0.0 , 1.0 ) , x , "PERIODIC:" ) ; } } kfj-vspline-4b365417c271/example/verify.cc000066400000000000000000000157631333775006700201750ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file verify.cc \brief verify bspline interpolation against polynomial A b-spline is a piecewise polynomial function. Therefore, it should model a polynomial of the same degree precisely. This program tests this assumption. While the test should hold throughout, we have to limit it to 'reasonable' degrees, because we have to create the spline over a range sufficiently large to make margin errors disappear, so even if we want to, say, only look at the spline's values between 0 and 1, we have to have a few ten or even 100 values to the left and right of this interval, because the polynomial does not exhibit convenient features like periodicity or mirroring on the bounds. But since a polynomial, outside [-1,1], grows with the power of it's degree, the values around the test interval become very large very quickly with high degrees. We can't reasonable expect to calculate a meaningful spline over such data. The test shows how the measured fidelity degrades with higher degrees due to this effect. still , with 'reasonable' degrees, we see that the spline fits the signal very well indeed, demonstrating that the spline can faithfully represent a polynomial of equal degree. */ #include #include #include #include #include #include #include template < class dtype > struct random_polynomial { int degree ; std::vector < dtype > coefficient ; // set up a polynomial with random coefficients in [0,1] random_polynomial ( int _degree ) : degree ( _degree ) , coefficient ( _degree + 1 ) { std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution<> dis ( -1.0 , 1.0 ) ; for ( auto & e : coefficient ) e = dis ( gen ) ; } // evaluate the polynomial at x dtype operator() ( dtype x ) { dtype power = 1 ; dtype result = 0 ; for ( auto e : coefficient ) { result += e * power ; power *= x ; } return result ; } } ; template < class dtype > void polynominal_test ( int degree , const char * type_name ) { // this is the function we want to model: random_polynomial < long double > rp ( degree ) ; // we evaluate this function in the range [-200,200[ // note that for high degrees, the signal will grow very // large outside [-1,1], 'spoiling' the test vigra::MultiArray < 1 , dtype > data ( vigra::Shape1 ( 400 ) ) ; for ( int i = 0 ; i < 400 ; i++ ) { dtype x = ( i - 200 ) ; data[i] = rp ( x ) ; } // we create a b-spline over these data vspline::bspline < dtype , 1 > bspl ( 400 , degree , vspline::NATURAL , 0.0 ) ; bspl.prefilter ( data ) ; auto ev = vspline::make_evaluator < decltype(bspl), dtype > ( bspl ) ; // to test the spline against the polynomial, we generate random // arguments in [-2,2] std::random_device rd ; std::mt19937 gen ( rd() ) ; std::uniform_real_distribution dis ( -2.0 , 2.0 ) ; long double signal = 0 ; long double spline = 0 ; long double noise = 0 ; long double error ; long double max_error = 0 ; // now we evaluate the spline and the polynomial at equal arguments // and do the statistics for ( int i = 0 ; i < 10000 ; i++ ) { long double x = dis ( gen ) ; long double p = rp ( dtype ( x ) ) ; // note how we have to translate x to spline coordinates here long double s = ev ( dtype ( x + 200 ) ) ; error = std::fabs ( p - s ) ; if ( error > max_error ) max_error = error ; signal += std::fabs ( p ) ; spline += std::fabs ( s ) ; noise += error ; } // note: with optimized code, this does not work: if ( std::isnan ( noise ) || std::isnan ( noise ) ) { std::cout << type_name << " aborted due to numeric overflow" << std::endl ; return ; } long double mean_error = noise / 10000.0L ; // finally we echo the results of the test std::cout << type_name << " Mean error: " << mean_error << " Maximum error: " << max_error << " SNR " << int ( 20 * std::log10 ( signal / noise ) ) << "dB" << std::endl ; } int main ( int argc , char * argv[] ) { std::cout << std::fixed << std::showpos << std::showpoint << std::setprecision(18) ; for ( int degree = 1 ; degree < 15 ; degree++ ) { std::cout << "testing spline against polynomial, degree " << degree << std::endl ; polynominal_test < float > ( degree , "using float........" ) ; polynominal_test < double > ( degree , "using double......." ) ; polynominal_test < long double > ( degree , "using long double.." ) ; } } kfj-vspline-4b365417c271/extrapolate.h000066400000000000000000000164641333775006700174270ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file extrapolate.h \brief extrapolation of 1D data sets with specific boundary conditions */ #ifndef VSPLINE_EXTRAPOLATE_H #define VSPLINE_EXTRAPOLATE_H #include "common.h" namespace vspline { /// struct extrapolator is a helper class providing extrapolated /// values for a 1D buffer indexed with possibly out-of-range indices. /// The extrapolated value is returned by value. boundary conditions /// PERIODIC , MIRROR , REFLECT, NATURAL and CONSTANT are currently /// supported. /// An extrapolator is set up by passing the boundary condition code /// (see common.h) and a const reference to the 1D data set, coded /// as a 1D vigra::MultiArrayView. The view has to refer to valid data /// for the time the extrapolator is in use. /// Now the extrapolator object can be indexed with arbitrary indices, /// and it will return extrapolated values. The indexing is done with /// operator() rather than operator[] to mark the semantic difference. /// Note how buffers with size 1 are treated specially for some /// boundary conditions: here we simply return the value at index 0. template < class buffer_type > struct extrapolator { const buffer_type & buffer ; typedef typename buffer_type::value_type value_type ; // we handle the polymorphism by calling the specific extrapolation // routine via a method pointer. This enables us to provide a uniform // interface without having to set up a virtual base class and inherit // from it. typedef value_type ( extrapolator::*p_xtr ) ( int i ) const ; p_xtr _p_xtr ; value_type extrapolate_mirror ( int i ) const { int w = buffer.size() - 1 ; if ( w == 0 ) return buffer[0] ; i = std::abs ( i ) ; if ( i >= w ) { i %= 2 * w ; i -= w ; i = std::abs ( i ) ; i = w - i ; } return buffer [ i ] ; } value_type extrapolate_natural ( int i ) const { int w = buffer.size() - 1 ; if ( w == 0 ) return buffer[0] ; if ( i >= 0 && i <= w ) return buffer[i] ; int sign = i < 0 ? -1 : 1 ; i = std::abs ( i ) ; int p = 2 * w ; int np = i / p ; int r = i % p ; value_type help ; if ( r <= w ) { help = buffer[r] - buffer[0] ; help += np * 2 * ( buffer[w] - buffer[0] ) ; help *= sign ; help += buffer [ 0 ] ; return help ; } r = 2 * w - r ; help = 2 * ( buffer [ w ] - buffer [ 0 ] ); help -= ( buffer[r] - buffer [ 0 ] ) ; help += np * 2 * ( buffer[w] - buffer[0] ) ; help *= sign ; help += buffer [ 0 ] ; return help ; } value_type extrapolate_reflect ( int i ) const { int w = buffer.size() ; if ( i < 0 ) i = -1 - i ; if ( i >= w ) { i %= 2 * w ; if ( i >= w ) i = 2 * w - i - 1 ; } return buffer [ i ] ; } value_type extrapolate_periodic ( int i ) const { int w = buffer.size() ; if ( w == 1 ) return buffer[0] ; if ( i < 0 || i >= w ) { i %= w ; if ( i < 0 ) i += w ; } return buffer [ i ] ; } value_type extrapolate_clamp ( int i ) const { if ( i < 0 ) return buffer [ 0 ] ; int w = buffer.size() - 1 ; if ( i >= w ) return buffer [ w ] ; return buffer [ i ] ; } /// class extrapolator's constructor takes the boundary /// condition code and a const reference to the buffer. /// the specific extrapolation routine is picked in the /// case switch and assigned to the method pointer which /// will be invoked by operator(). extrapolator ( vspline::bc_code bc , const buffer_type & _buffer ) : buffer ( _buffer ) { switch ( bc ) { case vspline::PERIODIC : _p_xtr = & extrapolator::extrapolate_periodic ; break ; case vspline::REFLECT : _p_xtr = & extrapolator::extrapolate_reflect ; break ; case vspline::NATURAL : _p_xtr = & extrapolator::extrapolate_natural ; break ; case vspline::MIRROR : _p_xtr = & extrapolator::extrapolate_mirror ; break ; case vspline::CONSTANT : _p_xtr = & extrapolator::extrapolate_clamp ; break ; default: throw vspline::not_implemented ( "extrapolator: unknown boundary condition" ) ; break ; } } /// operator() uses the specific extrapolation method to provide /// a value for position i. value_type operator() ( const int & i ) const { return (this->*_p_xtr) ( i ) ; } } ; } ; // namespace vspline #endif // #define VSPLINE_EXTRAPOLATE_H kfj-vspline-4b365417c271/filter.h000066400000000000000000001727721333775006700163710ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file filter.h \brief generic implementation of separable filtering for nD arrays This body of code provides the application of separable filters, not the filters themselves. Most essential for vspline is the use for b-spline prefiltering, see prefilter.h for the implementation of the specific filter. Another type of filter used by and provided by vspline is general separable convolution, which, inside vspline, is used for reconstruction of the original data from b-spline coefficients, see convolve.h. The code in this file is what I call 'wielding' code. It's function is to present the data in such a fashion that the code needed for the actual filter is a reasonably trivial 1D operation. 'Presenting' the data is a complex operation in vspline: the data are distributed to a set of worker threads, and they are restructured so that they can be processed by the processor's vector units. All of this is optional and transparent to the calling code. The 'wielding' code in this file is structurally similar to the code in transform.h, but here we use specific buffering operations which would make no sense there: for separable filtering, we have to preserve the adjacency of the data along the processing axis and present it to the filter, which is unnecessary for transformations, where each value can be processed in isolation. Most of the functionality in this file is in namespace detail, signalling that it is not meant to be called from outside. Class buffer_handling and the data types it uses to interface with nD memory are the exception, since they are meant to be inherited/used to implement specific filters. At the bottom of the file there's a free function template called 'filter'. This is what other code will normally call. */ #include #include "common.h" #ifndef VSPLINE_FILTER_H #define VSPLINE_FILTER_H namespace vspline { /// class 'bundle' holds all information needed to access a set of /// vsize 1D subarrays of an nD array. This is the data structure /// we use to tell the buffering and unbuffering code which data /// we want it to put into the buffer or distribute back out. The /// buffer itself holds the data in compact form, ready for vector /// code to access them at maximum speed. template < class dtype , // data type size_t vsize > // vector width struct bundle { dtype * data ; // data base address const std::ptrdiff_t * idx ; // pointer to gather/scatter indexes std::ptrdiff_t stride ; // stride in units of dtype unsigned long z ; // number of repetitions bundle ( dtype * _data , const std::ptrdiff_t * _idx , std::ptrdiff_t _stride , unsigned long _z ) : data ( _data ) , idx ( _idx ) , stride ( _stride ) , z (_z ) { } ; } ; /// move bundle data to compact memory template < class stype , // source data type class ttype , // target data type size_t vsize > // vector width void move ( const bundle < stype , vsize > & src , ttype * trg , std::ptrdiff_t trg_stride ) { auto z = src.z ; auto ps = src.data ; while ( z-- ) // repeat z times: { for ( size_t i = 0 ; i < vsize ; i++ ) { // load from source, store to target, using ith index trg [ i ] = ttype ( ps [ src.idx [ i ] ] ) ; } ps += src.stride ; // apply stride to source trg += trg_stride ; // and target } } // nearly the same, but takes ni as runtime parameter, effectively // limiting the transfer to the first ni offsets in the bundle. // This will only be used rarely, so performance is less of an issue. template < class stype , // source data type class ttype , // target data type size_t vsize > // vector width void move ( const bundle < stype , vsize > & src , ttype * trg , std::ptrdiff_t trg_stride , int ni ) { auto z = src.z ; auto ps = src.data ; while ( z-- ) // repeat z times: { for ( int i = 0 ; i < ni ; i++ ) { // load from source, store to target, using ith index trg [ i ] = ttype ( ps [ src.idx [ i ] ] ) ; } trg += trg_stride ; ps += src.stride ; // apply stride to source } } /// move data from compact memory to bundle template < class stype , // source data type class ttype , // target data type size_t vsize > // vector width void move ( const stype * src , std::ptrdiff_t src_stride , const bundle < ttype , vsize > & trg ) { auto z = trg.z ; auto pt = trg.data ; while ( z-- ) // repeat z times: { for ( size_t i = 0 ; i < vsize ; i++ ) { // load from source, store to target, using ith index pt [ trg.idx [ i ] ] = ttype ( src [ i ] ) ; } src += src_stride ; pt += trg.stride ; // apply stride to target } } // nearly the same, but takes ni as runtime parameter, effectively // limiting the transfer to the first ni offsets in the bundle. template < class stype , // source data type class ttype , // target data type size_t vsize > // vector width void move ( const stype * src , std::ptrdiff_t src_stride , const bundle < ttype , vsize > & trg , int ni ) { auto z = trg.z ; auto pt = trg.data ; while ( z-- ) // repeat z times: { for ( int i = 0 ; i < ni ; i++ ) { // load from source, store to target, using ith index pt [ trg.idx [ i ] ] = ttype ( src [ i ] ) ; } src += src_stride ; pt += trg.stride ; // apply stride to target } } /// buffer_handling provides services needed for interfacing /// with a buffer of simdized/goading data. The init() routine /// receives two views: one to a buffer accepting incoming data, /// and one to a buffer providing results. Currently, all filters /// used in vspline operate in-place, but the two-argument form /// leaves room to manoevre. /// get() and put() receive 'bundle' arguments which are used /// to transfer incoming data to the view defined in in_window, /// and to transfer result data from the view defined in /// out_window back to target memory. template < template < typename , size_t > class _vtype , typename _dtype , size_t _vsize > class buffer_handling { protected: enum { vsize = _vsize } ; typedef _dtype dtype ; typedef _vtype < dtype , vsize > vtype ; vigra::MultiArrayView < 1 , vtype > in_window ; vigra::MultiArrayView < 1 , vtype > out_window ; void init ( vigra::MultiArrayView < 1 , vtype > & _in_window , vigra::MultiArrayView < 1 , vtype > & _out_window ) { in_window = _in_window ; out_window = _out_window ; } // get and put receive 'bundle' objects, currently the code // uses a template for the arguments but we might fix it to // bundles only // note the use of 'offset' which is needed for situations // when input/output consists of several arrrays // why not use ni all the time? I suspect // fixed-width moves are faster. TODO test // KFJ 2018-02-20 added parameter for the buffer's stride from // one datum to the next, expressed in untits of 'dtype'. This // is needed if the buffer contains SimdArrays which hold // padding and therefore are larger than vsize dtype. This does // only occur for certain vsize values, not for the default as // per common.h, so it went unnoticed for some time. static const std::ptrdiff_t bf_stride = sizeof(vtype) / sizeof(dtype) ; public: /// fetch data from 'source' into the buffer 'in_window' template < class tractor > void get ( const tractor & src , std::ptrdiff_t offset = 0 , int ni = vsize ) const { if ( ni == vsize ) // fixed-width move move ( src , (dtype*) ( in_window.data() + offset ) , bf_stride ) ; else // reduced width move move ( src , (dtype*) ( in_window.data() + offset ) , bf_stride , ni ) ; } /// deposit result data from 'out_window' into target memory template < class tractor > void put ( const tractor & trg , std::ptrdiff_t offset = 0 , int ni = vsize ) const { if ( ni == vsize ) move ( (const dtype *) ( out_window.data() + offset ) , bf_stride , trg ) ; else move ( (const dtype *) ( out_window.data() + offset ) , bf_stride , trg , ni ) ; } } ; namespace detail { /// 'present' feeds 'raw' data to a filter and returns the filtered /// data. In order to perform this task with maximum efficiency, /// the actual code is quite involved. /// /// we have two variants of the routine, one for 'stacks' of several /// arrays (vpresent) and one for single arrays (present). /// /// The routine used in vspline is 'present', 'vpresent' is for special /// cases. present splits the data into 'bundles' of 1D subarrays /// collinear to the processing axis. These bundles are fed to the /// 'handler', which copies them into a buffer, performs the actual /// filtering, and then writes them back to target memory. /// /// Using 'vpresent', incoming data are taken as std::vectors of /// source_view_type. The incoming arrays have to have the same extent /// in every dimension *except* the processing axis. While the actual /// process of extracting parts of the data for processing is slightly /// more involved, it is analogous to first concatenating all source /// arrays into a single array, stacking along the processing axis. /// The combined array is then split up into 1D subarrays collinear /// to the processing axis, and sets of these subarrays are passed to /// the handler by calling it's 'get' method. The set of 1D subarrays //// is coded as a 'bundle', which describes such a set by a combination /// of base address and a set of gather/scatter indexes. /// /// Once the data have been accepted by the handler, the handler's /// operator() is called, which results in the handler filtering the /// data (or whatever else it might do). Next, the processed data are /// taken back from the handler by calling it's 'put' routine. The put /// routine also receives a 'bundle' parameter, resulting in the /// processed data being distributed back into a multidimensional /// array (or a set of them, like the input). /// /// This mechanism sounds complicated, but buffering the data for /// processing (which oftentimes has to look at the data several times) /// is usually more efficient than operating on the data in their /// in-array locations, which are often widely distributed, making /// the memory access slow. On top of the memory efficiency gain, /// there is another aspect: by choosing the bundle size wisely, /// the buffered data can be processed by vector code. Even if the /// data aren't explicit SIMD vectors (which is an option), the /// simple fact that they 'fit' allows the optimizer to autovectorize /// the code, a technique which I call 'goading': You present the /// data in vector-friendly guise and thereby lure the optimizer to /// do the right thing. Another aspect of buffering is that the /// buffer can use a specific data type best suited to the arithmetic /// operation at hand which may be different from the source and target /// data. This is especially useful if incoming data are of an integral /// type: operating directly on integers would spoil the data, but if /// the buffer is set up to contain a real type, the data are lifted /// to it on arrival in the buffer and processed with float maths. /// A drawback to this method of dealing with integral data is the fact /// that, when filtering nD data along several axes, intermediate /// results are stored back to the integral type after processing along /// each axis, accruing quantization errors with each pass. If this is /// an issue - like, with high-dimensional data or insufficient dynamic /// range, please consider moving to a real data type before filtering. /// /// Note that this code operates on arrays of fundamentals. The code /// calling this routine will have element-expanded data which weren't /// fundamentals in the first place. This expansion helps automatic /// vectorization, and for explicit vectorization with Vc it is even /// necessary. /// /// Also note that this routine operates in single-threaded code: /// It's invoked via vspline::multithread, and each worker thread will /// perform it's own call to 'present'. This is why the first argument /// is a range (containing the range of the partitioning assigned to /// the current worker thread) and why the other arguments come in as /// pointers, where I'd usually pass by reference. // we start out with 'present', which takes plain MultiArrayViews // for input and output. This is the simpler case. 'vpresent', which // follows, is structurally similar but has to deal with the extra // complication of processing 'stacks' instead of single arrays. template < typename source_view_type , typename target_view_type , typename stripe_handler_type > void present ( vspline::index_range_type range , const source_view_type * p_source , target_view_type * p_target , const typename stripe_handler_type::arg_type * p_args , int axis ) { enum { dimension = source_view_type::actual_dimension } ; enum { vsize = stripe_handler_type::vsize } ; // get references to the source and target views const source_view_type & source ( *p_source ) ; target_view_type & target ( *p_target ) ; // get the total length of the axis we want to process std::ptrdiff_t count = source.shape ( axis ) ; // set up the 'stripe handler' which holds the buffer(s) // and calls the filter. It's important that this does not // happen until now when we've landed in the code executed // by the worker threads, since the stripe_handler holds // state which must not be shared between threads. stripe_handler_type handler ( *p_args , count ) ; // get the data types we're dealing with. These are fundamentals, // since the arrays have been element-expanded for processing. typedef typename source_view_type::value_type stype ; typedef typename target_view_type::value_type ttype ; // take a slice from the source array orthogonal to the processing // axis. That's where the starting points of the 1D subarrays are. Then // obtain a MultiCoordinateIterator over the indexes in the slice auto sample_slice = source.bindAt ( axis , 0 ) ; vigra::MultiCoordinateIterator < dimension - 1 > mci ( sample_slice.shape() ) ; // range is simply a 1D index, since we're using an index_range. // Adding 1D offsets to MultiCoordinateIterators forwards them // by so many steps. So by adding range[0] we have the first starting // point to be processed by this job, and adding range[1] gives the // limit (both expressed as nD indexes into the slice) auto sliter = mci + range[0] ; auto sliter_end = mci + range[1] ; // shape_type can hold an nD index into the slice, just what // sliter refers to. typedef vigra::TinyVector < std::ptrdiff_t , dimension - 1 > shape_type ; // set of indexes for one run. Note the initialization with // the first index, guarding against 'narrow stripes'. vigra::TinyVector < shape_type , vsize > indexes { *sliter } ; // set of offsets into the source slice which will be used for // gather/scatter. These will be equivalent to the indexes above, // 'condensed' by applying the stride and summing up. vigra::TinyVector < std::ptrdiff_t , vsize > offsets ; // now we iterate over the range of 1D subarrays we've been assigned to. while ( sliter < sliter_end ) { // get vsize starting indexes, save them in 'indexes' int i ; for ( i = 0 ; i < vsize && sliter < sliter_end ; ++i , ++sliter ) { indexes[i] = *sliter ; } // process the source array auto stride = source.stride ( axis ) ; auto size = source.shape ( axis ) ; auto source_slice = source.bindAt ( axis , 0 ) ; auto source_base_adress = source_slice.data() ; // obtain a set of offsets from the set of indexes by 'condensing' // the nD index into an offset - by applying the strides and summing up for ( int e = 0 ; e < vsize ; e++ ) { offsets[e] = sum ( source_slice.stride() * indexes[e] ) ; } // form a 'bundle' to pass the data to the handler bundle < stype , vsize > bi ( source_base_adress , offsets.data() , stride , size ) ; // now use the bundle to move the data to the handler's buffer handler.get ( bi , 0 , i ) ; // now call the handler's operator(). This performs the actual // filtering of the data handler() ; // now empty out the buffer to the target array, using pretty much // the same set of operations as was used for fetching the data // from source. stride = target.stride ( axis ) ; size = target.shape ( axis ) ; auto target_slice = target.bindAt ( axis , 0 ) ; auto target_base_adress = target_slice.data() ; for ( int e = 0 ; e < vsize ; e++ ) offsets[e] = sum ( target_slice.stride() * indexes[e] ) ; bundle < ttype , vsize > bo ( target_base_adress , offsets.data() , stride , size ) ; handler.put ( bo , 0 , i ) ; } } /// vpresent is a variant of 'present' processing 'stacks' of arrays. /// See 'present' for discussion. This variant of 'present' will rarely /// be used. Having it does no harm but if you study the code, you may /// safely ignore it unless you are actually using single-axis filtering /// of stacks of arrays. the code is structurally similar to 'present', /// with the extra complication of processing stacks instead of single /// arrays. template < typename source_view_type , typename target_view_type , typename stripe_handler_type > void vpresent ( vspline::index_range_type range , const std::vector * p_source , std::vector * p_target , const typename stripe_handler_type::arg_type * p_args , int axis ) { enum { dimension = source_view_type::actual_dimension } ; enum { vsize = stripe_handler_type::vsize } ; // get references to the std::vectors holding source and target views const std::vector & source ( *p_source ) ; std::vector & target ( *p_target ) ; // get the total length of the axis we want to process std::ptrdiff_t count = 0 ; for ( auto & e : source ) count += e.shape ( axis ) ; // set up the 'stripe handler' which holds the buffer(s) // and calls the filter. It's important that this does not // happen until now when we've landed in the code executed // by the worker threads, since the stripe_handler holds // state which must not be shared between threads. stripe_handler_type handler ( *p_args , count ) ; // get the data types we're dealing with. These are fundamentals, // since the arrays have been element-expanded for processing. typedef typename source_view_type::value_type stype ; typedef typename target_view_type::value_type ttype ; // take a slice from the first source array orthogonal to the processing // axis. That's where the starting points of the 1D subarrays are. Then // obtain a MultiCoordinateIterator over the indexes in the slice auto sample_slice = source[0].bindAt ( axis , 0 ) ; vigra::MultiCoordinateIterator < dimension - 1 > mci ( sample_slice.shape() ) ; // range is simply a 1D index, since we're using an index_range. // Adding 1D offsets to MultiCoordinateIterators forwards them // by so many steps. So by adding range[0] we have the first starting // point to be processed by this job, and adding range[1] gives the // limit (both expressed as nD indexes into the slice) auto sliter = mci + range[0] ; auto sliter_end = mci + range[1] ; // shape_type can hold an nD index into the slice, just what // sliter refers to. typedef vigra::TinyVector < std::ptrdiff_t , dimension - 1 > shape_type ; // set of indexes for one run. Note the initialization with // the first index, guarding against 'narrow stripes'. vigra::TinyVector < shape_type , vsize > indexes { *sliter } ; // set of offsets into the source slice which will be used for // gather/scatter. These will be equivalent to the indexes above, // 'condensed' by applying the stride and summing up. vigra::TinyVector < std::ptrdiff_t , vsize > offsets ; // now we iterate over the range of 1D subarrays we've been assigned to. while ( sliter < sliter_end ) { // get vsize starting indexes, save them in 'indexes' int i ; for ( i = 0 ; i < vsize && sliter < sliter_end ; ++i , ++sliter ) { indexes[i] = *sliter ; } // iterate over the input arrays, loading data into the buffer // from all arrays in turn, using the same set of indexes. // 'progress' holds the part of 'count' that has been transferred // already. std::ptrdiff_t progress = 0 ; // now iterate over the source arrays. for ( auto & input : source ) { auto source_stride = input.stride ( axis ) ; auto part_size = input.shape ( axis ) ; auto slice = input.bindAt ( axis , 0 ) ; auto source_base_adress = slice.data() ; // obtain a set of offsets from the set of indexes by 'condensing' // the nD index into an offset - by applying the strides and summing up for ( int e = 0 ; e < vsize ; e++ ) { offsets[e] = sum ( slice.stride() * indexes[e] ) ; } // form a 'bundle' to pass the data to the handler bundle < stype , vsize > bi ( source_base_adress , offsets.data() , source_stride , part_size ) ; // now use the bundle to fill part_size entries in the handler's // buffer, starting at 'progress'. // then carry on with the next input array, if any handler.get ( bi , progress , i ) ; progress += part_size ; } // now call the handler's operator(). This performs the actual // filtering of the data handler() ; // now empty out the buffer to the std::vector of target arrays, // using pretty much the same set of operations as was used for // fetching the data from source. progress = 0 ; for ( auto & output : target ) { auto target_stride = output.stride ( axis ) ; auto part_size = output.shape ( axis ) ; auto slice = output.bindAt ( axis , 0 ) ; auto target_base_adress = slice.data() ; for ( int e = 0 ; e < vsize ; e++ ) offsets[e] = sum ( slice.stride() * indexes[e] ) ; bundle < ttype , vsize > bo ( target_base_adress , offsets.data() , target_stride , part_size ) ; handler.put ( bo , progress , i ) ; progress += part_size ; } } } /// struct separable_filter is the central object used for 'wielding' /// filters. The filters themselves are defined as 1D operations, which /// is sufficient for a separable filter: the 1D operation is applied /// to each axis in turn. If the *data* themselves are 1D, this is /// inefficient if the run of data is very long: we'd end up with a /// single thread processing the data without vectorization. So for this /// special case, we use a bit of trickery: long runs of 1D data are /// folded up, processed as 2D (with multithreading and vectorization) /// and the result of this operation, which isn't correct everywhere, /// is 'mended' where it is wrong. If the data are nD, we process them /// by buffering chunks collinear to the processing axis and applying /// the 1D filter to these chunks. 'Chunks' isn't quite the right word /// to use here - what we're buffering are 'bundles' of 1D subarrays, /// where a bundle holds as many 1D subarrays as a SIMD vector is wide. /// this makes it possible to process the buffered data with vectorized /// code. While most of the time the buffering will simply copy data into /// and out of the buffer, we use a distinct data type for the buffer /// which makes sure that arithmetic can be performed in floating point /// and with sufficient precision to do the data justice. With this /// provision we can safely process arrays of integral type. Such data /// are 'promoted' to this type when they are buffered and converted to /// the result type afterwards. Of course there will be quantization /// errors if the data are converted to an integral result type; it's /// best to use a real result type. /// The type for arithmetic operations inside the filter is fixed via /// stripe_handler_type, which takes a template argument '_math_ele_type'. /// This way, the arithmetic type is distributed consistently. /// Also note that an integral target type will receive the data via a /// simple type conversion and not with saturation arithmetics. If this /// is an issue, filter to a real-typed target and process separately. /// A good way of using integral data is to have integral input /// and real-typed output. Promoting the integral data to a real type /// preserves them precisely, and the 'exact' result is then stored in /// floating point. With such a scheme, raw data (like image data, /// which are often 8 or 16 bit integers) can be 'sucked in' without /// need for previous conversion, producing b-spline coefficients /// in, say, float for further processing. template < typename input_array_type , typename output_array_type , typename stripe_handler_type > struct separable_filter { enum { dimension = input_array_type::actual_dimension } ; static_assert ( dimension == output_array_type::actual_dimension , "separable_filter: input and output array type must have the same dimension" ) ; typedef typename input_array_type::value_type in_value_type ; typedef typename output_array_type::value_type out_value_type ; enum { channels = vigra::ExpandElementResult < in_value_type > :: size } ; static_assert ( channels == vigra::ExpandElementResult < out_value_type > :: size , "separable_filter: input and output data type must have the same number of channels" ) ; typedef typename vigra::ExpandElementResult < in_value_type > :: type in_ele_type ; typedef typename vigra::ExpandElementResult < out_value_type > :: type out_ele_type ; typedef std::integral_constant < bool , dimension == 1 > is_1d_type ; typedef std::integral_constant < bool , channels == 1 > is_1_channel_type ; /// this is the standard entry point to the separable filter code /// for processing *all* axes of an array. first we use a dispatch /// to separate processing of 1D data from processing of nD data. template < class filter_args > // may be single argument or a std::vector void operator() ( const input_array_type & input , output_array_type & output , const filter_args & handler_args , int njobs = vspline::default_njobs ) const { // we use a dispatch depending on whether data are 1D or nD arrays _on_dimension ( is_1d_type() , input , output , handler_args , njobs ) ; } // _on_dimension differentiates between 1D and nD data. We don't // look at the arguments - they are simply forwarded to either // _process_1d or _process_nd. template < typename ... types > void _on_dimension ( std::true_type , // 1D data types ... args ) const { // data are 1D. unpack the variadic content and call // the specialized method _process_1d ( args ... ) ; } template < typename ... types > void _on_dimension ( std::false_type , // nD data types ... args ) const { // data are nD. unpack the variadic content and call // the code for nD processing. _process_nd ( args ... ) ; } /// specialized processing of 1D input/output. /// We have established that the data are 1D. /// we have received a std::vector of handler arguments. /// It has to contain precisely one element which we unpack /// and use to call the overload below. template < typename in_vt , typename out_vt > void _process_1d ( const in_vt & input , out_vt & output , const std::vector < typename stripe_handler_type::arg_type > & handler_args , int njobs ) const { assert ( handler_args.size() == 1 ) ; _process_1d ( input , output , handler_args[0] , njobs ) ; } /// specialized processing of 1D input/output. /// We have established that the data are 1D and we have /// a single handler argument. /// This routine may come as a surprise and it's quite long /// and complex. The strategy is this: /// - if the data are 'quite short', simply run a 1D filter /// directly on the data, without any multithreading or /// vectorization. If the user has specified 'zero tolerance', /// do the same. /// - otherwise employ 'fake 2D processing': pretend the /// data are 2D, filter them with 2D code (which offers /// multithreading and vectorization) and then 'mend' /// the result, which is wrong in parts due to the /// inappropriate processing. /// expect 'fake 2D processing' to kick in for buffer sizes /// somewhere in the low thousands, to give you a rough idea. /// All data paths in this routine make sure that the maths /// are done in math_type, there won't be storing of /// intermediate values to a lesser type. If the user /// has specified 'zero tolerance' and the output type is not /// the same as math_type, we have a worst-case scenario where /// the entire length of data is buffered in math_type and the /// operation is single-threaded and unvectorized, but this /// should rarely happen and requires the user to explicitly /// override the defaults. If the data are too short for fake /// 2D processing, the operation will also fail to multithread /// or vectorize. // TODO: establish the cost of powering up the multithreaded data // processing to set a lower limit for data sizes which should be // processed with several threads: the overhead for small data sets // might make multithreading futile. template < typename in_vt , typename out_vt > void _process_1d ( const in_vt & input , out_vt & output , const typename stripe_handler_type::arg_type & handler_args , int njobs ) const { typedef typename in_vt::value_type in_value_type ; typedef typename out_vt::value_type out_value_type ; // we'll need to access the 'raw' filter. To specify it's type in // agreement with 'stripe_handler_type', we glean math_ele_type // from there and construct math_type from it. typedef typename stripe_handler_type::math_ele_type math_ele_type ; typedef canonical_type < math_ele_type , channels > math_type ; // obtain a raw filter capable of processing math_type auto raw_filter = stripe_handler_type::template get_raw_filter < math_type > ( handler_args ) ; // right now, we only need the filter's support width, but we // may use the filter further down. const int bands = channels ; int runup = raw_filter.get_support_width() ; // if we can multithread, start out with as many lanes // as the desired number of threads int lanes = njobs ; enum { vsize = stripe_handler_type::vsize } ; // the number of lanes is multiplied by the // number of elements a vector-friendly type can handle lanes *= vsize ; // the absolute minimum to successfully run the fake 2D filter is this: // TODO we might rise the threshold, min_length, here int min_length = 4 * runup * lanes + 2 * runup ; // runup == INT_MAX signals that fake 2D processing is inappropriate. // if input is too short to bother with fake 2D, just single-lane it if ( runup == INT_MAX || input.shape(0) < min_length ) { lanes = 1 ; } else { // input is larger than the absolute minimum, maybe we can even increase // the number of lanes some more? we'd like to do this if the input is // very large, since we use buffering and don't want the buffers to become // overly large. But the smaller the run along the split x axis, the more // incorrect margin values we have to mend, so we need a compromise. // assume a 'good' length for input: some length where further splitting // would not be wanted anymore. TODO: do some testing, find a good value int good_length = 64 * runup * lanes + 2 * runup ; int split = 1 ; // suppose we split input.shape(0) in ( 2 * split ) parts, is it still larger // than this 'good' length? If not, leave split factor as it is. while ( input.shape(0) / ( 2 * split ) >= good_length ) { // if yes, double split factor, try again split *= 2 ; } lanes *= split ; // increase number of lanes by additional split } // if there's only one lane we fall back to single-threaded // operation, using a 'raw' filter directly processing the // input - either producing the output straight away or, // intermediately, it's representation in math_type. if ( lanes == 1 ) { // we look at the data first: if out_value_type is the same type // as math_type, we can use the raw filter directly on input and // output. This is also possible if the filter is single-pass, // because a single-pass filter does not need to store intermediate // results - so convolution is okay, but b-spline prefiltering // is not. if ( std::is_same < out_value_type , math_type > :: value || stripe_handler_type::is_single_pass ) { auto raw_filter = stripe_handler_type::template get_raw_filter < in_value_type , out_value_type , math_type > ( handler_args ) ; raw_filter.solve ( input , output ) ; } else { // we can't use the easy option above. So we'll have to create // a buffer of math_type, use that as target, run the filter // and copy the result to output. This is potentially expensive: // the worst case is that we have to create a buffer which is // larger than the whole input signal (if math_type's size is // larger than in-value_type's) - and on top, the operation is // single-threaded and unvectorized. This should rarely happen // for long signals. Mathematically, we're definitely on the // safe side, provided the user hasn't chosen an unsuitable // math_type. vigra::MultiArray < 1 , math_type > buffer ( input.shape() ) ; auto raw_filter = stripe_handler_type::template get_raw_filter < in_value_type , math_type , math_type > ( handler_args ) ; raw_filter.solve ( input , buffer ) ; auto trg = output.begin() ; for ( auto const & src : buffer ) { *trg = out_value_type ( src ) ; ++trg ; } } return ; // return directly. we're done } // the input qualifies for fake 2D processing. // we want as many chunks as we have lanes. There may be some data left // beyond the chunks (tail_size of value_type) int core_size = input.shape(0) ; int chunk_size = core_size / lanes ; core_size = lanes * chunk_size ; int tail_size = input.shape(0) - core_size ; // just doublecheck assert ( core_size + tail_size == input.shape(0) ) ; // now here's the strategy: we treat the data as if they were 2D. This will // introduce errors along the 'vertical' margins, since there the 2D treatment // will start with some boundary condition along the x axis instead of looking // at the neighbouring line where the actual continuation is. // first we deal with the very beginning and end of the signal. This requires // special treatment, because here we want the boundary conditions to take // effect. So we copy the beginning and end of the signal to a buffer, being // generous with how many data we pick. The resulting buffer will have an // unusable part in the middle, where tail follows head, but since we've made // sure that this location is surrounded by enough 'runup' data, the effect // will only be detectable at +/- runup from the point where tail follows head. // The beginning of head and the end of tail are at the beginning and end // of the buffer, though, so that applying the boundary condition will // have the desired effect. What we'll actually use of the buffer is not // the central bit with the effects of the clash of head and tail, but // only the bits at the ends which aren't affected because they are far enough // away. Another way of looking at this operation is that we 'cut out' a large // central section of the data and process the remainder, ignoring the cut-out // part. Then we only use that part of the result which is 'far enough' away // from the cut to be unaffected by it. // note how this code fixes a bug in my initial implementation, which produced // erroneous results with periodic splines, because the boundary condition // was not properly honoured. // calculate the sizes of the parts of the signal we'll put into the buffer int front = 2 * runup ; int back = tail_size + 2 * runup ; int total = front + back ; // create the buffer and copy the beginning and end of the signal into it. // Note how the data are converted to math_type to do the filtering vigra::MultiArray < 1 , math_type > head_and_tail ( total ) ; auto target_it = head_and_tail.begin() ; auto source_it = input.begin() ; for ( int i = 0 ; i < front ; i++ ) { *target_it = math_type ( *source_it ) ; ++target_it ; ++source_it ; } source_it = input.end() - back ; for ( int i = 0 ; i < back ; i++ ) { *target_it = math_type ( *source_it ) ; ++target_it ; ++source_it ; } // this buffer is submitted to the 'raw' filter. After the call, the buffer // has usable data for the very beginning and end of the signal. raw_filter.solve ( head_and_tail , head_and_tail ) ; // set up two MultiArrayViews corresponding to the portions of the data // we copied into the buffer. The first bit of 'head' and the last bit // of 'tail' hold valid data and will be used further down. vigra::MultiArrayView < 1 , math_type > head ( vigra::Shape1 ( front ) , head_and_tail.data() ) ; vigra::MultiArrayView < 1 , math_type > tail ( vigra::Shape1 ( back ) , head_and_tail.data() + front ) ; // head now has runup correct values at the beginning, succeeded by runup // invalid values, and tail has tail_size + runup correct values at the end, // preceded by runup values which aren't usable. // now we create a fake 2D view to the margin of the data. Note how we let // the view begin 2 * runup before the end of the first line, capturing the // 'wraparound' right in the middle of the view // KFJ 2018-02-11 both here, and a bit further down, where 'margin_target' // is set up, I had forgotten to multiply the offset which is added to // *.data() with the appropriate stride, resulting in memory errors where // the stride wasn't 1. since this rarely happens, I did not notice it // until now. vigra::MultiArrayView < 2 , in_value_type > fake_2d_margin ( vigra::Shape2 ( 4 * runup , lanes - 1 ) , vigra::Shape2 ( input.stride(0) , input.stride(0) * chunk_size ) , input.data() + input.stride(0) * ( chunk_size - 2 * runup ) ) ; // again we create a buffer and filter into the buffer vigra::MultiArray < 2 , out_value_type > margin_buffer ( fake_2d_margin.shape() ) ; separable_filter < vigra::MultiArrayView < 2 , in_value_type > , vigra::MultiArrayView < 2 , out_value_type > , stripe_handler_type >() ( fake_2d_margin , margin_buffer , 0 , handler_args , njobs ) ; // now we have filtered data for the margins in margin_buffer, // of which the central half is usable, the remainder being runup data // which we'll ignore. Here's a view to the central half: vigra::MultiArrayView < 2 , out_value_type > margin = margin_buffer.subarray ( vigra::Shape2 ( runup , 0 ) , vigra::Shape2 ( 3 * runup , lanes - 1 ) ) ; // we create a view to the target array's margin which we intend // to overwrite, but the data will only be copied in from margin // after the treatment of the core. vigra::MultiArrayView < 2 , out_value_type > margin_target ( vigra::Shape2 ( 2 * runup , lanes - 1 ) , vigra::Shape2 ( output.stride(0) , output.stride(0) * chunk_size ) , output.data() + output.stride(0) * ( chunk_size - runup ) ) ; // next we 'fake' a 2D array from input and filter it to output, this may // be an in-place operation, since we've extracted all margin information // earlier and deposited what we need in buffers. vigra::MultiArrayView < 2 , in_value_type > fake_2d_source ( vigra::Shape2 ( chunk_size , lanes ) , vigra::Shape2 ( input.stride(0) , input.stride(0) * chunk_size ) , input.data() ) ; vigra::MultiArrayView < 2 , out_value_type > fake_2d_target ( vigra::Shape2 ( chunk_size , lanes ) , vigra::Shape2 ( output.stride(0) , output.stride(0) * chunk_size ) , output.data() ) ; // now we filter the fake 2D source to the fake 2D target separable_filter < vigra::MultiArrayView < 2 , in_value_type > , vigra::MultiArrayView < 2 , out_value_type > , stripe_handler_type >() ( fake_2d_source , fake_2d_target , 0 , handler_args , njobs ) ; // we now have filtered data in target, but the stripes along the magins // in x-direction (1 runup wide) are wrong, because we applied whatever // boundary conditions inherent to the filter, while the data in fact // continued from one line end to the next one's beginning. // this is why we have the data in 'margin', and we now copy them to the // relevant section of 'target' margin_target = margin ; // finally we have to fix the first and last few values, which weren't // touched by the margin operation (due to margin's offset and length) // note how we move back from 'math_type' to 'out_value_type'. for ( int i = 0 ; i < runup ; i++ ) output[i] = out_value_type ( head[i] ) ; int j = tail.size() - tail_size - runup ; for ( int i = output.size() - tail_size - runup ; i < output.size() ; i++ , j++ ) output[i] = out_value_type ( tail[j] ) ; } // end of first _process_1d() overload /// specialized processing of nD input/output. We have established /// that the data are nD. Now we process the axes in turn, passing /// the per-axis handler args. void _process_nd ( const input_array_type & input , output_array_type & output , const std::vector < typename stripe_handler_type::arg_type > & handler_args , int njobs ) const { _process_nd ( input , output , 0 , handler_args [ 0 ] , njobs ) ; for ( int axis = 1 ; axis < dimension ; axis++ ) { _process_nd ( output , output , axis , handler_args [ axis ] , njobs ) ; } } // in the next section we have code processing nD data along // a specific axis. The code starts with an operator() overload // meant to be called from 'outside'. This is meant for cases // where filtering needs to be done differently for different // axes. After that we have the actual processing code. /// this operator() overload for single-axis processing takes /// plain arrays for input and output, they may be either 1D or nD. /// again we use _on_dimension, now with a different argument /// signature (we have 'axis' now). As _on_dimension is a /// variadic template, we 'reemerge' in the right place. void operator() ( const input_array_type & input , output_array_type & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs = vspline::default_njobs ) const { _on_dimension ( is_1d_type() , input , output , axis , handler_args , njobs ) ; } /// processing of nD input/output. we have established that the data /// are nD. now we look at the data type. If the data are multi-channel, /// they need to be element-expanded for further processing, which /// is done by '_on_expand'. Note that there are two variants of /// _on_expand, one for plain arrays and one for stacks of arrays, /// which are coded after the operator() overload taking stacks of /// arrays further down. template < typename ... types > void _process_nd ( types ... args ) const { // we're all set for processing a single axis of data. // now we have a dispatch on is_1_channel_type, because if the // data are multi-channel, we want to element-expand the arrays. _on_expand ( is_1_channel_type() , args ... ) ; } /// variant of _on_expand for single arrays. this overload is called /// if the data are multi-channel. we element-expand the arrays, then /// call the single-channel overload below void _on_expand ( std::false_type , // is_1_channel_type() , const input_array_type & input , output_array_type & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs ) const { typedef vigra::MultiArrayView < dimension + 1 , in_ele_type > e_input_array_type ; typedef vigra::MultiArrayView < dimension + 1 , out_ele_type > e_output_array_type ; // note how we expand the element channel to dimension 0, to make sure // that adjacent 1D subarrays will likely be processed together. e_input_array_type source = input.expandElements ( 0 ) ; e_output_array_type target = output.expandElements ( 0 ) ; // with the element-expanded data at hand, we can now delegate to // the routine below, which deals with single-channel data _on_expand ( std::true_type() , // the expanded arrays are single-channel source , target , axis + 1 , // processing axis is now one higher handler_args , njobs ) ; } /// Variant of _on_expand for single arrays. This is the single-channel /// overload. The arrays now hold fundamentals, either because that was /// their original data type or because they have been element-expanded. /// Now we finally get to do the filtering. /// Note how we introduce input and output as templates, since we can't /// be sure of their type: they may or may not have been element-expanded. template < typename in_t , typename out_t > void _on_expand ( std::true_type , // is_1_channel_type() , const in_t & input , out_t & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs ) const { // 'present' is the workhorse routine handling the buffering, // unbuffering and handler invocation. We'll pass this routine // to 'multithread'. auto pf = & present < in_t , out_t , stripe_handler_type > ; // 'present' processes batches of 1D subarrays collinear to the processing // axis. We obtain a partitioning which provides these batches, coded as // index ranges. So if the arrays have Z 1D subarrays, the batches are // encoded as 0,A A,B ... Y,Z. We have 'size' 1D subarrays. So we form // an index range from 0 to 'size' auto size = input.size() / input.shape ( axis ) ; vspline::index_range_type ir { 0 , size } ; // for this specific partitioning, we have to pass 'vsize'. the // partitioner will set up the partitions so that they contain multiples // of 'vsize' if possible. vsize here is the same as the size of elements // in the buffer used for filtering. enum { vsize = stripe_handler_type::vsize } ; // now we use index_range_splitter (see multithread.h) // to produce the partitioning. auto partitioning = vspline::index_range_splitter::part ( ir , njobs , vsize ) ; // finally we use multithread() to distribute the partitions to individual // jobs which are executed by vspline's thread pool. After multithread // returns, we have the result in 'output', and we're done. vspline::multithread ( pf , partitioning , &input , &output , &handler_args , axis ) ; } /// this operator() overload for single-axis processing takes /// std::vectors ('stacks') of arrays. this is only supported /// for nD data. This is a rarely-used variant; throughout vspline /// there isn't currently any place where this routine is called, /// but it's needed for some special applications. If you are studying /// the code, you may safely disregard the remainder of the code in /// this class definition; the two _on_expand variants below are /// also for the special case of 'stacks' of arrays. // With a bit of adapter code, this path could be used for // processing vigra's 'chunked arrays': for every axis, put // all sequences of chunks collinear to that axis into a // std::vector (as MultiArrayViews, not the data themselves), // then pass these stacks to this routine. TODO: try // As long as one sequence of chunks fits into memory, the // process should be efficient, allowing filtering of very large // data sets. void operator() ( const std::vector & input , std::vector & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs = vspline::default_njobs ) const { static_assert ( ! is_1d_type() , "processing of stacked 1D arrays is not supported" ) ; _process_nd ( input , output , axis , handler_args , njobs ) ; } /// variant of _on_expand for stacks of arrays. /// this overload is called if the data are multi-channel. /// we element-expand the arrays, then call the single-channel /// overload below template < typename in_vt , typename out_vt > void _on_expand ( std::false_type , // is_1_channel_type() , const std::vector < in_vt > & input , std::vector < out_vt > & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs ) const { typedef vigra::MultiArrayView < dimension + 1 , in_ele_type > e_input_array_type ; typedef vigra::MultiArrayView < dimension + 1 , out_ele_type > e_output_array_type ; // note how we expand the element channel to dimension 0, to make sure // that adjacent 1D subarrays will be processed together. std::vector source ; for ( auto & e : input ) source.push_back ( e.expandElements ( 0 ) ) ; std::vector target ; for ( auto & e : output ) target.push_back ( e.expandElements ( 0 ) ) ; // with the element-expanded data at hand, we can now delegate to // the routine below, which deals with single-channel data _on_expand ( std::true_type() , // the expanded arrays are single-channel source , target , axis + 1 , // processing axis is now one higher handler_args , njobs ) ; } /// variant of _on_expand for stacks of arrays /// this is the single-channel overload. The arrays now hold fundamentals, /// either because that was their original data type or because they have /// been element-expanded. We end up in this routine for all processing /// of stacks and finally get to do the filtering. template < typename in_vt , typename out_vt > void _on_expand ( std::true_type , // is_1_channel_type() , const std::vector < in_vt > & input , std::vector < out_vt > & output , int axis , const typename stripe_handler_type::arg_type & handler_args , int njobs ) const { // 'vpresent' is the workhorse routine handling the buffering, // unbuffering and handler invocation. We'll pass this routine // to 'multithread'. auto pf = & vpresent < in_vt , out_vt , stripe_handler_type > ; // 'vpresent' processes batches of 1D subarrays collinear to the processing // axis. We obtain a partitioning which provides these batches, coded as // index ranges. So if the arrays have Z 1D subarrays, the batches are // encoded as 0,A A,B ... Y,Z. We have 'size' 1D subarrays. So we form // an index range from 0 to 'size' auto size = input[0].size() / input[0].shape ( axis ) ; vspline::index_range_type ir { 0 , size } ; // for this specific partitioning, we have to pass 'vsize'. the // partitioner will set up the partitions so that they contain multiples // of 'vsize' if possible. vsize here is the same as the size of elements // in the buffer used for filtering. enum { vsize = stripe_handler_type::vsize } ; // now we use index_range_splitter (see multithread.h) // to produce the partitioning. auto partitioning = vspline::index_range_splitter::part ( ir , njobs , vsize ) ; // finally we use multithread() to distribute the partitions to individual // jobs which are executed by vspline's thread pool. After multithread // returns, we have the result in 'output', and we're done. vspline::multithread ( pf , partitioning , &input , &output , &handler_args , axis ) ; } } ; // struct separable_filter } ; // namespace detail /// vspline::filter is the common entry point for filter operations /// in vspline. This routine does not yet do any processing, it's /// purpose is to convert it's arguments to 'canonical' format /// and then call the actual filter code in namespace detail. /// It also determines the type used for arithmetic operations. /// The type specification for input and output assures that only /// arrays with the same dimensionality are accepted, and a static /// assertion makes sure the number of channels match. canonical /// form means that input and output value type are either /// fundamental (for single-channel data) or TinyVectors of a /// fundamental data type. This way, the input and output is /// presented in a neutral form, ignoring all specifics of /// T1 and T2, which may be TinyVectors, complex, RGBValue etc. template < typename in_type , typename out_type , unsigned int D , class filter_type , typename ... types > void filter ( const vigra::MultiArrayView < D , in_type > & input , vigra::MultiArrayView < D , out_type > & output , types ... args ) { // find out the elementary (fundamental) type of in_type and out_type // by using vigra's ExpandElementResult mechanism. typedef typename vigra::ExpandElementResult < in_type > :: type in_ele_type ; typedef typename vigra::ExpandElementResult < out_type > :: type out_ele_type ; // get the number of channels and make sure it's consistent enum { channels = vigra::ExpandElementResult < in_type > :: size } ; static_assert ( channels == vigra::ExpandElementResult < out_type > :: size , "separable_filter: input and output data type must have the same number of channels" ) ; // produce the canonical types for both data types and arrays typedef canonical_type < in_type > canonical_in_value_type ; typedef vigra::MultiArrayView < D , canonical_in_value_type > cn_in_type ; typedef canonical_type < out_type > canonical_out_value_type ; typedef vigra::MultiArrayView < D , canonical_out_value_type > cn_out_type ; // call separable_filter with arrays reinterpreted as canonical types, // and all other arguments unchecked and unchanged. detail::separable_filter < cn_in_type , cn_out_type , filter_type >() ( reinterpret_cast < const cn_in_type & > ( input ) , reinterpret_cast < cn_out_type & > ( output ) , args ... ) ; } } ; // namespace vspline #endif // VSPLINE_FILTER_H kfj-vspline-4b365417c271/map.h000066400000000000000000000447761333775006700156630ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file map.h \brief code to handle out-of-bounds coordinates. Incoming coordinates may not be inside the range which can be evaluated by a functor. There is no one correct way of dealing with out-of-bounds coordinates, so I provide a few common ways of doing it. If the 'standard' gate types don't suffice, the classes provided here can serve as templates. The basic type handling the operation is a 'gate type', which 'treats' a single value or single simdized value. For nD coordinates, we use a set of these gate_type objects, one for each component; each one may be of a distinct type specific to the axis the component belongs to. Application of the gates is via a 'mapper' object, which contains the gate_type objects and applies them to the components in turn. The mapper object is a functor which converts an arbitrary incoming coordinate into a 'treated' coordinate (or, with REJECT mode, may throw an out_of_bounds exception). mapper objects are derived from vspline::unary_functor, so they fit in well with other code in vspline and can easily be combined with other unary_functor objects, or used stand-alone. They are used inside vspline to implement the factory function vspline::make_safe_evaluator, which chains a suitable mapper and an evaluator to create an object allowing safe evaluation of a b-spline with arbitrary coordinates where out-of-range coordinates are mapped to the defined range in a way fitting the b-spline's boundary conditions. */ #ifndef VSPLINE_MAP_H #define VSPLINE_MAP_H #include "unary_functor.h" #include namespace vspline { /// class pass_gate passes it's input to it's output unmodified. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > struct pass_gate : public vspline::unary_functor < rc_type , rc_type , _vsize > { template < class T > void eval ( const T & c , T & result ) const { result = c ; } } ; /// factory function to create a pass_gate type functor template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > vspline::pass_gate < rc_type , _vsize > pass() { return vspline::pass_gate < rc_type , _vsize >() ; } /// reject_gate throws vspline::out_of_bounds for invalid coordinates template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > struct reject_gate : public vspline::unary_functor < rc_type , rc_type , _vsize > { const rc_type lower ; const rc_type upper ; reject_gate ( rc_type _lower , rc_type _upper ) : lower ( _lower ) , upper ( _upper ) { } ; void eval ( const rc_type & c , rc_type & result ) const { if ( c < lower || c > upper ) throw vspline::out_of_bounds() ; result = c ; } /// vectorized evaluation function. This is enabled only if vsize > 1 /// to guard against cases where vectorization is used but vsize is 1. /// Without the enable_if, we'd end up with two overloads with the /// same signature, since in_v and out_v collapse to in_type and out_type /// with vsize 1. typedef vspline::unary_functor < rc_type , rc_type , _vsize > base_type ; using typename base_type::in_v ; using typename base_type::out_v ; template < typename = std::enable_if < ( _vsize > 1 ) > > void eval ( const in_v & c , out_v & result ) const { if ( any_of ( ( c < lower ) | ( c > upper ) ) ) throw vspline::out_of_bounds() ; result = c ; } } ; /// factory function to create a reject_gate type functor given /// a lower and upper limit for the allowed range. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > vspline::reject_gate < rc_type , _vsize > reject ( rc_type lower , rc_type upper ) { return vspline::reject_gate < rc_type , _vsize > ( lower , upper ) ; } /// clamp gate clamps out-of-bounds values. clamp_gate takes /// four arguments: the lower and upper limit of the gate, and /// the values which are returned if the input is outside the /// range: 'lfix' if it is below 'lower' and 'ufix' if it is /// above 'upper' template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > struct clamp_gate : public vspline::unary_functor < rc_type , rc_type , _vsize > { const rc_type lower ; const rc_type upper ; const rc_type lfix ; const rc_type ufix ; clamp_gate ( rc_type _lower , rc_type _upper , rc_type _lfix , rc_type _ufix ) : lower ( _lower <= _upper ? _lower : _upper ) , upper ( _upper >= _lower ? _upper : _lower ) , lfix ( _lower <= _upper ? _lfix : _ufix ) , ufix ( _upper >= _lower ? _ufix : _lfix ) { assert ( lower <= upper ) ; } ; /// simplified constructor, gate clamps to _lower and _upper clamp_gate ( rc_type _lower , rc_type _upper ) : clamp_gate ( _lower , _upper , _lower , _upper ) { } ; void eval ( const rc_type & c , rc_type & result ) const { if ( c < lower ) result = lfix ; else if ( c > upper ) result = ufix ; else result = c ; } typedef vspline::unary_functor < rc_type , rc_type , _vsize > base_type ; using typename base_type::in_v ; using typename base_type::out_v ; template < typename = std::enable_if < ( _vsize > 1 ) > > void eval ( const in_v & c , out_v & result ) const { result = c ; vspline::assign_if ( result , c < lower , lfix ) ; vspline::assign_if ( result , c > upper , ufix ) ; } } ; /// factory function to create a clamp_gate type functor given /// a lower and upper limit for the allowed range, and, optionally, /// the values to use if incoming coordinates are out-of-range template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > vspline::clamp_gate < rc_type , _vsize > clamp ( rc_type lower , rc_type upper , rc_type lfix , rc_type rfix ) { return vspline::clamp_gate < rc_type , _vsize > ( lower , upper , lfix , rfix ) ; } /// vectorized fmod function using std::trunc, which is fast, but /// checking the result to make sure it's always <= rhs. template rc_v v_fmod ( rc_v lhs , const typename rc_v::value_type & rhs ) { rc_v help ( lhs ) ; help /= rhs ; help = trunc ( help ) ; help *= rhs ; lhs -= help ; // due to arithmetic imprecision, result may come out >= rhs // so we doublecheck and set result to 0 when this occurs assign_if ( lhs , abs(lhs) >= abs(rhs) , 0 ) ; return lhs ; } /// mirror gate 'folds' coordinates into the range. From the infinite /// number of mirror images resulting from mirroring the input on the /// bounds, the only one inside the range is picked as the result. /// When using this gate type with splines with MIRROR boundary conditions, /// if the shape of the core for the axis in question is M, _lower would be /// passed 0 and _upper M-1. /// For splines with REFLECT boundary conditions, we'd pass -0.5 to /// _lower and M-0.5 to upper, since here we mirror 'between bounds' /// and the defined range is wider. /// /// Note how this mode of 'mirroring' allows use of arbitrary coordinates, /// rather than limiting the range of acceptable input to the first reflection, /// as some implementations do. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > struct mirror_gate : public vspline::unary_functor < rc_type , rc_type , _vsize > { const rc_type lower ; const rc_type upper ; mirror_gate ( rc_type _lower , rc_type _upper ) : lower ( _lower <= _upper ? _lower : _upper ) , upper ( _upper >= _lower ? _upper : _lower ) { assert ( lower < upper ) ; } ; void eval ( const rc_type & c , rc_type & result ) const { rc_type cc ( c - lower ) ; auto w = upper - lower ; cc = std::abs ( cc ) ; // left mirror, v is now >= 0 if ( cc >= w ) { cc = fmod ( cc , 2 * w ) ; // map to one full period cc -= w ; // center cc = std::abs ( cc ) ; // map to half period cc = w - cc ; // flip } result = cc + lower ; } typedef vspline::unary_functor < rc_type , rc_type , _vsize > base_type ; using typename base_type::in_v ; using typename base_type::out_v ; template < typename = std::enable_if < ( _vsize > 1 ) > > void eval ( const in_v & c , out_v & result ) const { in_v cc ( c - lower ) ; auto w = upper - lower ; cc = abs ( cc ) ; // left mirror, v is now >= 0 auto mask = ( cc >= w ) ; if ( any_of ( mask ) ) { auto cm = v_fmod ( cc , 2 * w ) ; // map to one full period cm -= w ; // center cm = abs ( cm ) ; // map to half period cm = in_v(w) - cm ; // flip assign_if ( cc , mask , cm ) ; } result = cc + lower ; } } ; /// factory function to create a mirror_gate type functor given /// a lower and upper limit for the allowed range. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > vspline::mirror_gate < rc_type , _vsize > mirror ( rc_type lower , rc_type upper ) { return vspline::mirror_gate < rc_type , _vsize > ( lower , upper ) ; } /// the periodic mapping also folds the incoming value into the allowed range. /// The resulting value will be ( N * period ) from the input value and inside /// the range, period being upper - lower. /// For splines done with PERIODIC boundary conditions, if the shape of /// the core for this axis is M, we'd pass 0 to _lower and M to _upper. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > struct periodic_gate : public vspline::unary_functor < rc_type , rc_type , _vsize > { const rc_type lower ; const rc_type upper ; periodic_gate ( rc_type _lower , rc_type _upper ) : lower ( _lower <= _upper ? _lower : _upper ) , upper ( _upper >= _lower ? _upper : _lower ) { assert ( lower < upper ) ; } ; void eval ( const rc_type & c , rc_type & result ) const { rc_type cc = c - lower ; auto w = upper - lower ; if ( ( cc < 0 ) || ( cc >= w ) ) { cc = fmod ( cc , w ) ; if ( cc < 0 ) cc += w ; // due to arithmetic imprecision, even though cc < 0 // cc+w may come out == w, so we need to test again: if ( cc >= w ) cc = 0 ; } result = cc + lower ; } typedef vspline::unary_functor < rc_type , rc_type , _vsize > base_type ; using typename base_type::in_v ; using typename base_type::out_v ; template < typename = std::enable_if < ( _vsize > 1 ) > > void eval ( const in_v & c , out_v & result ) const { in_v cc ; cc = c - lower ; auto w = upper - lower ; auto mask_below = ( cc < 0 ) ; auto mask_above = ( cc >= w ) ; auto mask_any = mask_above | mask_below ; if ( any_of ( mask_any ) ) { auto cm = v_fmod ( cc , w ) ; assign_if ( cm , mask_below , cm + w ) ; // due to arithmetic imprecision, even though cc < 0 // cc+w may come out == w, so we need to test again: assign_if ( cm , ( cm >= w ) , 0 ) ; assign_if ( cc , mask_any , cm ) ; } result = cc + lower ; } } ; /// factory function to create a periodic_gate type functor given /// a lower and upper limit for the allowed range. template < typename rc_type , size_t _vsize = vspline::vector_traits < rc_type > :: size > vspline::periodic_gate < rc_type , _vsize > periodic ( rc_type lower , rc_type upper ) { return vspline::periodic_gate < rc_type , _vsize > ( lower , upper ) ; } /// finally we define class mapper which is initialized with a set of /// gate objects (of arbitrary type) which are applied to each component /// of an incoming nD coordinate in turn. /// The trickery with the variadic template argument list is necessary, /// because we want to be able to combine arbitrary gate types (which /// have distinct types) to make the mapper as efficient as possible. /// the only requirement for a gate type is that it has to provide the /// necessary eval() functions. template < typename nd_rc_type , size_t _vsize , class ... gate_types > struct map_functor : public vspline::unary_functor < nd_rc_type , nd_rc_type , _vsize > { typedef typename vspline::unary_functor < nd_rc_type , nd_rc_type , _vsize > base_type ; typedef typename base_type::in_type in_type ; typedef typename base_type::out_type out_type ; enum { vsize = _vsize } ; enum { dimension = vigra::ExpandElementResult < nd_rc_type > :: size } ; // we hold the 1D mappers in a tuple typedef std::tuple < gate_types... > mvec_type ; // mvec holds the 1D gate objects passed to the constructor const mvec_type mvec ; // the constructor receives gate objects map_functor ( gate_types ... args ) : mvec ( args... ) { } ; // constructor variant taking a tuple of gates map_functor ( const mvec_type & _mvec ) : mvec ( _mvec ) { } ; // to handle the application of the 1D gates, we use a recursive // helper type which applies the 1D gate for a specific axis and // then recurses to the next axis until axis 0 is reached. // We also pass 'dimension' as template argument, so we can specialize // for 1D operation (see below) template < int level , int dimension , typename nd_coordinate_type > struct _map { void operator() ( const mvec_type & mvec , const nd_coordinate_type & in , nd_coordinate_type & out ) const { std::get(mvec).eval ( in[level] , out[level] ) ; _map < level - 1 , dimension , nd_coordinate_type >() ( mvec , in , out ) ; } } ; // at level 0 the recursion ends template < int dimension , typename nd_coordinate_type > struct _map < 0 , dimension , nd_coordinate_type > { void operator() ( const mvec_type & mvec , const nd_coordinate_type & in , nd_coordinate_type & out ) const { std::get<0>(mvec).eval ( in[0] , out[0] ) ; } } ; // here's the specialization for 1D operation template < typename coordinate_type > struct _map < 0 , 1 , coordinate_type > { void operator() ( const mvec_type & mvec , const coordinate_type & in , coordinate_type & out ) const { std::get<0>(mvec).eval ( in , out ) ; } } ; // now we define eval for unvectorized and vectorized operation // by simply delegating to struct _map at the top level. template < class in_type , class out_type > void eval ( const in_type & in , out_type & out ) const { _map < dimension - 1 , dimension , in_type >() ( mvec , in , out ) ; } } ; /// factory function to create a mapper type functor given /// a set of gate_type objects. Please see vspline::make_safe_evaluator /// for code to automatically create a mapper object suitable for a /// specific vspline::bspline. template < typename nd_rc_type , size_t _vsize = vspline::vector_traits < nd_rc_type > :: size , class ... gate_types > vspline::map_functor < nd_rc_type , _vsize , gate_types... > mapper ( gate_types ... args ) { return vspline::map_functor < nd_rc_type , _vsize , gate_types... > ( args... ) ; } } ; // namespace vspline #endif // #ifndef VSPLINE_MAP_H kfj-vspline-4b365417c271/multithread.h000066400000000000000000000677511333775006700174260ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file multithread.h /// /// \brief code to distribute the processing of bulk data to several threads /// /// The code in this header provides a resonably general method to perform /// processing of manifolds of data with several threads in parallel. In vspline, /// there are several areas where potentially large numbers of individual values /// have to be processed independently of each other or in a dependence which /// can be preserved in partitioning. To process such 'bulk' data effectively, /// vspline employs two strategies: multithreading and vectorization. /// This file handles the multithreading. /// /// To produce generic code for the purpose, we first introduce a model of what /// we intend to do. This model looks at the data as occupying a 'range' having /// a defined starting point and end point. We keep with the convention of defining /// ranges so that the start point is inside and the end point outside the data /// set described by the range, just like iterators obtained by begin() and end(). /// This range is made explicit, even if it is implicit in the data which we want to /// submit to multithreading, and there is a type for the purpose: struct range_type. /// range_type merely captures the concept of a range, taking 'limit_type' as it's /// template parameter, so that any type of range can be accomodated. A range is /// defined by it's lower and upper limit. /// /// Next we define an object holding a set of ranges, modeling a partitioning of /// an original/whole range into subranges, which, within the context of this code, /// are disparate and in sequence. This object is modeled as struct partition_type, /// taking a range_type as it's template argument. /// /// With these types, we model concrete ranges and partitionings. The most important /// one is dealing with multidimensional shapes, where a range extends from a 'lower' /// coordinate to just below a 'higer' coordinate. These two coordinates can be /// used directly to call vigra's 'subarray' function. /// /// Next we provide code to partition ranges into partitionings (sets of subranges). /// /// Finally we can express a generalized multithreading routine. This routine takes /// a functor capable of processing a range specification and a parameter pack of /// arbitrary further parameters, some of which will usually be referring to manifolds /// of data for which the given range makes sense. We call this routine with a /// partitioning of the original range and the same parameter pack that is to be passed /// on to the functor. The multithreading routine proceeds to set up 'tasks' as needed, /// providing each with the functor as it's functional, a subrange from /// the partitioning, and the parameter pack as arguments. The routine to be used /// to partition the 'whole' range is passed in. So, to reiterate: we don't partition /// the data, but only the information about the extent of the data. The multithreading /// routine itself merely operates on this extent information, the 'meaning' of the /// extent information is only known by the functor which is invoked in the worker /// threads and it is only put to use in the worker thread when the functor gets /// invoked, receiving it's 'range' as the first argument. With this strategy we can /// handle the process uniformly for a wide range of situations. /// /// The tasks, once prepared, are handed over to a 'joint_task' object which handles /// the interaction with the thread pool (in thread_pool.h). While my initial code /// used one thread per task, this turned out inefficient, because it was not granular /// enough: the slowest thread became the limiting factor. Now the job at hand is split /// into more individual tasks (something like 8 times the number of cores), resulting /// in a fair compromise concerning granularity. multithread() waits for all tasks to /// terminate and returns when it's certain that the job is complete. While it's feasible /// to code so that the multithreading routine does not block, I have chosen not to /// implement such behaviour for now. For many tasks, this is necessary: Consider /// nD filtering: Filtering along axis 1 can only start once filtering along axis 0 /// is complete. For transforms, launching a transform would be safe while another /// transform is running, provided the invocations don't get into each other's way. /// But waiting for the first transform to terminate is certainly safe. So unless I /// see a good reason for doing it differently, I'll stick with the blocking multithreading /// routine. Of course it's possible to launch several threads which call multithread /// synchronously, and these calls won't block each other. But then the user has to /// make sure the threads don't step on each other's toes. /// /// the use of multithreading can be suppressed by defining VSPLINE_SINGLETHREAD. While /// this is less efficient than avoiding calling multithread() altogether, it's easy and /// doesn't cost much performance. Since defining VSPLINE_SINGLETHREAD excludes all /// multithreading-related code, linking can omit -pthread. This option may be helpful /// in debugging. #ifndef VSPLINE_MULTITHREAD_H #define VSPLINE_MULTITHREAD_H #include #ifndef VSPLINE_SINGLETHREAD // only include multithreading-related headers if VSPLINE_SINGLETHREAD is not defined #include #include #include #include #include "thread_pool.h" #endif // #ifndef VSPLINE_SINGLETHREAD #include #include #include "common.h" namespace vspline { /// number of CPU cores in the system #ifdef VSPLINE_SINGLETHREAD const int ncores = 1 ; #else const int ncores = std::thread::hardware_concurrency() ; #endif /// when multithreading, use this number of jobs per default. This is /// an attempt at a compromise: too many jobs will produce too much overhead, /// too few will not distribute the load well and make the system vulnerable /// to 'straggling' threads #ifdef VSPLINE_SINGLETHREAD const int default_njobs = 1 ; #else const int default_njobs = 8 * ncores ; #endif // next we have partitioning code. this is used for multithreading, but not // multithreading-specific, so we don't exclude it even if VSPLINE_SINGLETHREAD // is defined /// given limit_type, we define range_type as a TinyVector of two limit_types, /// the first denoting the beginning of the range and the second it's end, with /// 'end' being just outside of the range. template < class limit_type > using range_type = vigra::TinyVector < limit_type , 2 > ; /// given range_type, we define partition_type as a std::vector of range_type. /// This data type is used to hold the partitioning of a range into subranges. template < class range_type > using partition_type = std::vector < range_type > ; // now we define a few specific range and partition types: /// index_range_type is for a simple range from one (1D) index to another typedef range_type < std::ptrdiff_t > index_range_type ; /// and index_partition_type is the corresponding partition type typedef partition_type < index_range_type > index_partition_type ; /// index_range_splitter partitions an index range. The split is performed /// so that, if possible, the partitions contain a multiple of 'vsize' indexes. /// We use the construct of 'bundles' containing vsize indexes each. The /// jobs are set up to contain at least one bundle each, if possible. /// If the partitioning produces chunks of different sizes, they are set up /// so that larger chunks are first in the partitioning, which produces /// slight 'tapering' - having smaller jobs towards the end. The tapering /// might be made more pronounced to increase efficiency. struct index_range_splitter { static index_partition_type part ( const index_range_type & range , std::ptrdiff_t njobs , const std::ptrdiff_t & vsize ) { // total number of indexes std::ptrdiff_t ni = range[1] - range[0] ; // we want to group the indexes in 'bundles' of vsize indexes. // total number of bundles: std::ptrdiff_t bundles = ni / vsize ; // leftover indexes leftover after forming bundles std::ptrdiff_t remainder = ni % vsize ; if ( bundles == 0 ) { // we can't even form one bundle, // so it's just one job with the whole range index_partition_type res ( 1 ) ; res[0] = range ; return res ; } // we have at least one bundle. all jobs get a basic allotment of // 'socket_size' (possibly 0) bundles std::ptrdiff_t socket_size = bundles / njobs ; // we may have 'leftover' bundles which have not been shared out // to the basic allotments std::ptrdiff_t leftover = bundles % njobs ; // if socket_size is 0, we adjust *njobs* so that each job // gets one of the 'leftover' bundles. Since we have taken // care of the case that we can't even form a single bundle, // we can be certain that njobs won't decrease to zero. if ( socket_size == 0 ) njobs = leftover ; // now we set up the index partitioning to hold 'njobs' subranges index_partition_type res ( njobs ) ; // and fill in the ranges for the jobs one by one for ( int j = 0 ; j < njobs ; j++ ) { // calculate the number of indexes going into this subrange int sz = socket_size * vsize ; if ( j < leftover ) sz += vsize ; // the very first subrange starts at range[0] and gets the remainder // on top. all other subranges start where the previous subrange ends if ( j == 0 ) { res[j][0] = range[0] ; sz += remainder ; } else { res[j][0] = res[j-1][1] ; } // all subranges end 'sz' after their beginning res[j][1] = res[j][0] + sz ; } // TODO: being paranoid here, may remove assertions for production code assert ( res[0][0] == range[0] ) ; // doublecheck assert ( res[njobs-1][1] == range[1] ) ; // doublecheck return res ; } } ; /// given a dimension, we define a shape_type as a TinyVector of /// vigra::MultiArrayIndex of this dimension. /// This is equivalent to vigra's shape type. // TODO: might instead define as: vigra::MultiArrayShape template < int dimension > using shape_type = vigra::TinyVector < vigra::MultiArrayIndex , dimension > ; /// given a dimension, we define shape_range_type as a range defined by /// two shapes of the given dimension. This definition allows us to directly /// pass the two shapes as arguments to a call of subarray() on a MultiArrayView /// of the given dimension. Note the subarray semantics: if the range is /// [2,2] to [4,4], it refers to elements [2,2], [3,2], [2,3], [3,3]. template < int dimension > using shape_range_type = range_type < shape_type < dimension > > ; template < int dimension > using shape_partition_type = partition_type < shape_range_type < dimension > > ; /// partition_to_stripes splits an nD region (defined by it's beginning and /// end nD index, passed in 'range') into nparts (or less) parts. The split /// is performed along the highest possible axis, to make the resulting /// chunks span as little memory as possible. template < int D > partition_type < shape_range_type < D > > partition_to_stripes ( shape_range_type < D > range , int nparts ) { // shortcut if nparts <= 1 if ( nparts <= 1 ) { partition_type < shape_range_type < D > > res ( 1 ) ; res[0] = range ; return res ; } // get the shape of the range (range[0] may not be the origin) auto shape = range[1] - range[0] ; // find the highest dimension that is at least nparts large int dmax ; for ( dmax = D - 1 ; dmax >= 0 ; dmax-- ) { if ( shape[dmax] >= nparts ) break ; } // if dmax is -1, there was no such dimension. // Try again with lowered nparts if nparts is greater than ncores, // otherwise just split the largest dimension into as many parts // as it's extent is large. if ( dmax == -1 ) { if ( ncores < nparts ) return partition_to_stripes ( range , ncores ) ; int nparts = -1 ; for ( int d = D - 1 ; d >= 0 ; d-- ) { if ( shape[d] > nparts ) { nparts = shape[d] ; dmax = d ; } } } // now we have dmax, the dimension to split, and nparts, the number of // parts to split it into. We delegate this task to index_range_splitter, // passing a range from the lower to the upper index along dmax and // vsize == 1, since we're not bundling. index_range_type index_range ; index_range [ 0 ] = range [ 0 ] [ dmax ] ; index_range [ 1 ] = range [ 1 ] [ dmax ] ; index_partition_type index_partition = index_range_splitter::part ( index_range , nparts , 1 ) ; // we note how many parts index_range_splitter has produced and set up // the shape partition we want to return to have just as many entries int size = index_partition.size() ; shape_partition_type < D > shape_partition ( size ) ; // now we fill in these entries: initially each entry gets the whole range, // then the index along dmax, the splitting axis, is adjusted to the // entries in the index partition for ( int r = 0 ; r < size ; r++ ) { shape_partition [ r ] [ 0 ] = range [ 0 ] ; shape_partition [ r ] [ 1 ] = range [ 1 ] ; shape_partition [ r ] [ 0 ] [ dmax ] = index_partition [ r ] [ 0 ] ; shape_partition [ r ] [ 1 ] [ dmax ] = index_partition [ r ] [ 1 ] ; } // TODO: being paranoid here, may remove assertions for production code assert ( shape_partition [ 0 ] [ 0 ] == range [ 0 ] ) ; // doublecheck assert ( shape_partition [ size - 1 ] [ 1 ] == range [ 1 ] ) ; // doublecheck return shape_partition ; } /// alternative partitioning into tiles. For the optimal situation, where /// the view isn't rotated or pitched much, the partitioning into bunches /// of lines (above) seems to perform slightly better, but with more difficult /// transformations (like 90 degree rotation), performance suffers (like, -20%), /// whereas with this tiled partitioning it is roughly the same, supposedly due /// to identical locality in both cases. So currently I am using this partitioning. /// note that the current implementation ignores the argument 'nparts' and /// produces tiles 160X160. /// Note that this routine is not currently used in vspline // TODO code is a bit clumsy... // TODO it may be a good idea to have smaller portions towards the end // of the partitioning, since they will be processed last, and if the // last few single-threaded operations are short, they may result in less // situations where a long single-threaded operation has just started when // all other tasks are already done, causing the system to idle on the other // cores. or at least the problem would not persist for so long. 'tapering' // TODO this is quite specific to pv, might be moved out or made more general template < int d > partition_type < shape_range_type > partition_to_tiles ( shape_range_type range , int nparts = default_njobs ) { // shortcut if nparts <= 1 if ( nparts <= 1 ) { partition_type < shape_range_type > res ( 1 ) ; res[0] = range ; return res ; } // To help with the dilemma that this function is really quite specific // for images, for the time being I delegate to return partition_to_stripes() // for dimensions != 2 if ( d != 2 ) return partition_to_stripes ( range , nparts ) ; auto shape = range[1] - range[0] ; // currently disregarding incoming nparts parameter: // int nelements = prod ( shape ) ; // int ntile = nelements / nparts ; // int nedge = pow ( ntile , ( 1.0 / d ) ) ; // TODO fixing this size is system-specific! int nedge = 160 ; // heuristic, fixed size tiles auto tiled_shape = shape / nedge ; typedef std::vector < int > stopv ; stopv stops [ d ] ; for ( int a = 0 ; a < d ; a++ ) { stops[a].push_back ( 0 ) ; for ( int k = 1 ; k < tiled_shape[a] ; k++ ) stops[a].push_back ( k * nedge ) ; stops[a].push_back ( shape[a] ) ; } for ( int a = 0 ; a < d ; a++ ) tiled_shape[a] = stops[a].size() - 1 ; int k = prod ( tiled_shape ) ; // If this partitioning scheme fails to produce a partitioning with // at least nparts components, fall back to using partition_to_stripes() if ( k < nparts ) return partition_to_stripes ( range , nparts ) ; nparts = k ; partition_type < shape_range_type > res ( nparts ) ; for ( int a = 0 ; a < d ; a++ ) { int j0 = 1 ; for ( int h = 0 ; h < a ; h++ ) j0 *= tiled_shape[h] ; int i = 0 ; int j = 0 ; for ( int k = 0 ; k < nparts ; k++ ) { res[k][0][a] = stops[a][i] ; res[k][1][a] = stops[a][i+1] ; ++j ; if ( j == j0 ) { j = 0 ; ++i ; if ( i >= tiled_shape[a] ) i = 0 ; } } } for ( auto & e : res ) { e[0] += range[0] ; e[1] += range[0] ; } return res ; } #ifdef VSPLINE_SINGLETHREAD // if multithreading is suppressed by VSPLINE_SINGLETHREAD, // we use fallback code. multithreading-specific code will not be // referenced at all, the relevant headers aren't included. // TODO while here is a central location where multithreading can be // switched off easily, it would be more efficient to modify the code // calling multithread to avoid it altogether. The performance gain // should be minimal, though, since this is not inner-loop code. /// fallback routine for multithread() if VSPLINE_SINGLETHREAD /// is defined to deactivate multithreading. This overload simply passes /// the 'whole' range to the single-thread function, and 'partition' /// is not called at all. template < class range_type , class ...Types > int multithread ( void (*pfunc) ( range_type , Types... ) , partition_type < range_type > (*partition) ( range_type , int ) , int nparts , range_type range , Types ...args ) { (*pfunc) ( range , args... ) ; return 1 ; } /// fallback routine for multithread() if VSPLINE_SINGLETHREAD /// is defined to deactivate multithreading. Here, partitioning has /// already occured, but it's not used: a range is formed spanning /// all the ranges in the partitioning and the single-thread function /// is invoked with this range. // TODO might still honour the partitioning and call pfunc for all // parts in turn template < class range_type , class ...Types > int multithread ( void (*pfunc) ( range_type , Types... ) , partition_type < range_type > partitioning , Types ...args ) { int nparts = partitioning.size() ; range_type range ( partitioning[0][0] , partitioning[nparts-1][1] ) ; (*pfunc) ( range , args... ) ; return nparts ; } #else // ifdef VSPLINE_SINGLETHREAD // we start out with some collateral code: /// action_wrapper wraps a functional into an outer function which /// first calls the functional and then checks if this was the last /// of a bunch of actions to complete, by incrementing the counter /// p_done points to and comparing the result to 'nparts'. If the /// test succeeds, the caller is notified via the condition variable /// p_pool_cv points to, under the mutex p_pool_mutex points to. static void action_wrapper ( std::function < void() > payload , int nparts , std::mutex * p_pool_mutex , std::condition_variable * p_pool_cv , int * p_done ) { // execute the 'payload' payload() ; // under the coordinator's pool mutex, increase the caller's // 'done' counter and test if it's now equal to 'nparts', the total // number of actions in this bunch // TODO initially I had the notify_all call after closing the scope of // the lock guard, but I had random crashes. Changing the code to call // notify_all with the lock guard still in effect seemed to remove the // problem, but made me unsure of my logic. // 2017-06-23 after removing a misplaced semicolon after the conditional // below I recoded to perform the notification after closing the lock_guard's // scope, and now there doesn't seem to be any problem any more. I leave // these comments in for reference in case things go wrong // TODO remove this and previous comment if all is well // 2017-10-12 when stress-testing with restore_test, I had random crashes // and failure to join again, so I've taken the notify call into the lock // guard's scope again to see if that fixes it which seems to be the case. { std::lock_guard lk ( * p_pool_mutex ) ; if ( ++ ( * p_done ) == nparts ) { // this was the last action originating from the coordinator // notify the coordinator that the joint task is now complete p_pool_cv->notify_one() ; } } } // with this collateral code at hand, we can now implement multithread(). /// multithread uses a thread pool of worker threads to perform /// a multithreaded operation. It receives a functor (a single-threaded /// function used for all individual tasks), a partitioning, which contains /// information about which part of the data each task should work on, and /// a set of additional parameters to pass on to the functor. /// The individual 'payload' tasks are created by binding the functor with /// /// - a range from the partitioning, describing it's share of the data /// /// - the remaining parameters /// /// These tasks are bound to a wrapper routine which takes care of /// signalling when the last task has completed. // TODO may write an equivalent function taking an iterator yielding ranges // instead of a container with ranges. static thread_pool common_thread_pool ; // keep a thread pool only for multithread() template < class range_type , class ... Types > int multithread ( void (*pfunc) ( range_type , Types ... ) , partition_type < range_type > partitioning , Types ... args ) { // get the number of ranges in the partitioning int nparts = partitioning.size() ; // guard against empty or wrong partitioning if ( nparts <= 0 ) { return 0 ; } if ( nparts == 1 ) { // if only one part is in the partitioning, we take a shortcut // and execute the function right here: (*pfunc) ( partitioning[0] , args ... ) ; return 1 ; } // alternatively, 'done' can be coded as std::atomic. I tried // but couldn't detect any performance benefit, even though allegedly // atomics are faster than using mutexes... so I'm leaving the code // as it was, using an int and a mutex. int done = 0 ; // number of completed tasks std::mutex pool_mutex ; // mutex to guard access to done and pool_cv std::condition_variable pool_cv ; // for signalling completion { // under the thread pool's task_mutex, fill tasks into task queue std::lock_guard lk ( common_thread_pool.task_mutex ) ; for ( int i = 0 ; i < nparts ; i++ ) { // first create the 'payload' function std::function < void() > payload = std::bind ( pfunc , partitioning[i] , args ... ) ; // now bind it to the action wrapper and enqueue it std::function < void() > action = std::bind ( action_wrapper , payload , nparts , &pool_mutex , &pool_cv , &done ) ; common_thread_pool.task_queue.push ( action ) ; } } // alert all worker threads common_thread_pool.task_cv.notify_all() ; { // now wait for the last task to complete. This is signalled by // action_wrapper by notifying on pool_cv and doublechecked // by testing for done == nparts std::unique_lock lk ( pool_mutex ) ; // the predicate done == nparts rejects spurious wakes pool_cv.wait ( lk , [&] { return done == nparts ; } ) ; } // all jobs are done return nparts ; } /// This variant of multithread() takes a pointer to a function performing /// the partitioning of the incoming range. The partitioning function is /// invoked on the incoming range (provided nparts is greater than 1) and /// the resulting partitioning is used as an argument to the first variant /// of multithread(). // TODO It might be better to code this using std::function objects. // TODO may use move semantics for forwarding instead of relying on the // optimizer to figure this out template < class range_type , class ... Types > int multithread ( void (*pfunc) ( range_type , Types ... ) , partition_type < range_type > (*partition) ( range_type , int ) , int nparts , range_type range , Types ... args ) { if ( nparts <= 1 ) { // if only one part is requested, we take a shortcut and execute // the function right here: (*pfunc) ( range , args ... ) ; return 1 ; } // partition the range using the function pointed to by 'partition' auto partitioning = (*partition) ( range , nparts ) ; // then pass pfunc, the partitioning and the remaining arguments // to the variant of multithread() above accepting a partitioning return multithread ( pfunc , partitioning , args ... ) ; } #endif } ; // end if namespace vspline #endif // #ifndef VSPLINE_MULTITHREAD_H kfj-vspline-4b365417c271/poles.h000066400000000000000000004033031333775006700162110ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file poles.h \brief precalculated prefilter poles and basis function values The contents of this file below the comments can be generated using bootstrap.cc both the precalculated basis function values and the prefilter poles can be generated in arbitrary precision, so the constants below are given as 'xlf_type' - concrete splines will downcast them to whatever precision they use for prefiltering (usually the same type as the data type's elementary type). The values defined here are used in several places in vspline. They are precomputed because calculating them when needed can be (potentially very) expensive, and providing them by definitions evaluated at compile time slows compilation. Great care is taken to provide very exact values. The code in bootstrap.cc goes beyond simply calculating polynomial roots with gsl/blas, adding a polishing stage done in arbitrary precision arithmetic. The basis function values are done as GMP fractions, and only divided out before the result is printed out. The set of values provided here is sufficient to calculate the b-spline basis function for all spline degrees for arbitrary arguments (including the spline's derivatives) - see basis.h. The poles are needed for prefiltering. Note: I have calculated the basis function values and prefilter poles using arbitrary-precision library GNU GMP. The precision of these values is way beyond long double and well sufficient to initialize quad floats (__float128) up to their precision, which is only 80bit mantissa or so. So now I express the literals below with a macro which adds a suffix to them so they can be interpreted as literals of the precision wanted for the given compile. */ #include "common.h" // to have xlf_type and macro XLF() #ifndef VSPLINE_POLES_H namespace vspline_constants { using vspline::xlf_type ; /// to allow code to inquire about the maximal spline degree possible static const int max_degree { 45 } ; // For long doubles, the number of postcomma digits I provide is clearly // total overkill, but it leaves enough room to manoevre if the need for // more precise values should arise - and when assigned to 'lesser' types // the value is rounded to what is actually used. // I have refrained from giving the basis function values exact (as fractions // of integral values) - if this should be needed, the way to obtain these // values can easily be gleaned by looking at bootstrap.cc, which does // initially compute such fractions and only casts them to floating point // before echoing them. // For most real applications, the number of degrees covered is also overkill. // TODO how to communicate K0[-.5]==1? const xlf_type K0[] = { XLF(1) , } ; const xlf_type K1[] = { XLF(1) , XLF(0.5) , } ; const xlf_type K2[] = { XLF(0.75) , XLF(0.5) , XLF(0.125) , } ; const xlf_type K3[] = { XLF(0.6666666666666666666666666666666666666666666666666666666666666667) , XLF(0.4791666666666666666666666666666666666666666666666666666666666667) , XLF(0.1666666666666666666666666666666666666666666666666666666666666667) , XLF(0.02083333333333333333333333333333333333333333333333333333333333333) , } ; const xlf_type K4[] = { XLF(0.5989583333333333333333333333333333333333333333333333333333333333) , XLF(0.4583333333333333333333333333333333333333333333333333333333333333) , XLF(0.1979166666666666666666666666666666666666666666666666666666666667) , XLF(0.04166666666666666666666666666666666666666666666666666666666666667) , XLF(0.002604166666666666666666666666666666666666666666666666666666666667) , } ; const xlf_type K5[] = { XLF(0.55) , XLF(0.4380208333333333333333333333333333333333333333333333333333333333) , XLF(0.2166666666666666666666666666666666666666666666666666666666666667) , XLF(0.06171875) , XLF(0.008333333333333333333333333333333333333333333333333333333333333333) , XLF(0.0002604166666666666666666666666666666666666666666666666666666666667) , } ; const xlf_type K6[] = { XLF(0.5110243055555555555555555555555555555555555555555555555555555556) , XLF(0.4194444444444444444444444444444444444444444444444444444444444444) , XLF(0.2287977430555555555555555555555555555555555555555555555555555556) , XLF(0.07916666666666666666666666666666666666666666666666666666666666667) , XLF(0.01566840277777777777777777777777777777777777777777777777777777778) , XLF(0.001388888888888888888888888888888888888888888888888888888888888889) , XLF(2.170138888888888888888888888888888888888888888888888888888888889e-05) , } ; const xlf_type K7[] = { XLF(0.4793650793650793650793650793650793650793650793650793650793650794) , XLF(0.4025964161706349206349206349206349206349206349206349206349206349) , XLF(0.2363095238095238095238095238095238095238095238095238095238095238) , XLF(0.09402436755952380952380952380952380952380952380952380952380952381) , XLF(0.02380952380952380952380952380952380952380952380952380952380952381) , XLF(0.003377666170634920634920634920634920634920634920634920634920634921) , XLF(0.0001984126984126984126984126984126984126984126984126984126984126984) , XLF(1.550099206349206349206349206349206349206349206349206349206349206e-06) , } ; const xlf_type K8[] = { XLF(0.4529209681919642857142857142857142857142857142857142857142857143) , XLF(0.3873759920634920634920634920634920634920634920634920634920634921) , XLF(0.2407776847718253968253968253968253968253968253968253968253968254) , XLF(0.1064732142857142857142857142857142857142857142857142857142857143) , XLF(0.03212696862599206349206349206349206349206349206349206349206349206) , XLF(0.006125992063492063492063492063492063492063492063492063492063492063) , XLF(0.000634765625) , XLF(2.48015873015873015873015873015873015873015873015873015873015873e-05) , XLF(9.68812003968253968253968253968253968253968253968253968253968254e-08) , } ; const xlf_type K9[] = { XLF(0.430417768959435626102292768959435626102292768959435626102292769) , XLF(0.3736024025676532186948853615520282186948853615520282186948853616) , XLF(0.2431492504409171075837742504409171075837742504409171075837742504) , XLF(0.1168385769744819223985890652557319223985890652557319223985890653) , XLF(0.04025573192239858906525573192239858906525573192239858906525573192) , XLF(0.009453129305831128747795414462081128747795414462081128747795414462) , XLF(0.001383377425044091710758377425044091710758377425044091710758377425) , XLF(0.0001058857697448192239858906525573192239858906525573192239858906526) , XLF(2.755731922398589065255731922398589065255731922398589065255731922e-06) , XLF(5.382288910934744268077601410934744268077601410934744268077601411e-09) , } ; const xlf_type K10[] = { XLF(0.4109626428244185405643738977072310405643738977072310405643738977) , XLF(0.3610984347442680776014109347442680776014109347442680776014109347) , XLF(0.2440661561888571979717813051146384479717813051146384479717813051) , XLF(0.1254387125220458553791887125220458553791887125220458553791887125) , XLF(0.04798334892044201940035273368606701940035273368606701940035273369) , XLF(0.01318342151675485008818342151675485008818342151675485008818342152) , XLF(0.00245328523074087852733686067019400352733686067019400352733686067) , XLF(0.0002791556437389770723104056437389770723104056437389770723104056437) , XLF(1.58879786361882716049382716049382716049382716049382716049382716e-05) , XLF(2.755731922398589065255731922398589065255731922398589065255731922e-07) , XLF(2.691144455467372134038800705467372134038800705467372134038800705e-10) , } ; const xlf_type K11[] = { XLF(0.3939255651755651755651755651755651755651755651755651755651755652) , XLF(0.3497022318874430690836940836940836940836940836940836940836940837) , XLF(0.2439602873977873977873977873977873977873977873977873977873977874) , XLF(0.132561165432106594215969215969215969215969215969215969215969216) , XLF(0.0552020202020202020202020202020202020202020202020202020202020202) , XLF(0.01716314960753132139850889850889850889850889850889850889850889851) , XLF(0.003823878667628667628667628667628667628667628667628667628667628668) , XLF(0.0005712862612632713544171877505210838544171877505210838544171877505) , XLF(5.100609267275933942600609267275933942600609267275933942600609267e-05) , XLF(2.16679942326914983164983164983164983164983164983164983164983165e-06) , XLF(2.505210838544171877505210838544171877505210838544171877505210839e-08) , XLF(1.22324747975789642456309122975789642456309122975789642456309123e-11) , } ; const xlf_type K12[] = { XLF(0.3788440845447299915073352573352573352573352573352573352573352573) , XLF(0.3392729502364919031585698252364919031585698252364919031585698252) , XLF(0.243130918010144694715007215007215007215007215007215007215007215) , XLF(0.1384514665504248837582170915504248837582170915504248837582170916) , XLF(0.06186766800904132548826559243225909892576559243225909892576559243) , XLF(0.0212685824013949013949013949013949013949013949013949013949013949) , XLF(0.005458186925696725230145369034257923146812035700924589813478702368) , XLF(0.0009984747441344663566885789108011330233552455774677996900219122441) , XLF(0.0001209139205918753716062743840521618299396077173854951632729410507) , XLF(8.523979878146544813211479878146544813211479878146544813211479878e-06) , XLF(2.708616506969914087969643525199080754636310191865747421302976859e-07) , XLF(2.087675698786809897921009032120143231254342365453476564587675699e-09) , XLF(5.096864498991235102346213457324568435679546790657901769012880124e-13) , } ; const xlf_type K13[] = { XLF(0.3653708694854528187861521194854528187861521194854528187861521195) , XLF(0.3296898795859100119354025604025604025604025604025604025604025604) , XLF(0.241788417986334653001319667986334653001319667986334653001319668) , XLF(0.1433150174717420836602151706318372985039651706318372985039651706) , XLF(0.06797496725882142548809215475882142548809215475882142548809215476) , XLF(0.02540440629498498879873662859773970885081996193107304218415329526) , XLF(0.007312236695917251472807028362583918139473695029250584806140361696) , XLF(0.001567173108165633054413436357880802325246769691214135658580103025) , XLF(0.000237629847004847004847004847004847004847004847004847004847004847) , XLF(2.349228542020798693975777309110642443975777309110642443975777309e-05) , XLF(1.313308604975271641938308604975271641938308604975271641938308605e-06) , XLF(3.125375747123929632610188165743721299276854832410387965943521499e-08) , XLF(1.605904383682161459939237717015494793272571050348828126605904384e-10) , XLF(1.960332499612013500902389791278680167569056457945346834235723125e-14) , } ; const xlf_type K14[] = { XLF(0.3532391566991892985022170290027432884575741718598861456004313147) , XLF(0.3208502450206319253938301557349176396795444414492033539652587272) , XLF(0.2400829904155873420494246852133756895661657566419471181375943281) , XLF(0.1473218009462429105286248143391000533857676714819571962429105286) , XLF(0.07354103256406706097994152929668802684675700548716421732294748168) , XLF(0.0294998002323771172977522183871390220596569802919009268215617422) , XLF(0.009341081854512256904689707889112651017412922174826936731698636461) , XLF(0.00227591965005159449603894048338492782937227381671826116270560715) , XLF(0.0004110905114937219671610172602236094299586363078426570490062553555) , XLF(5.204637459101744816030530316244601958887673173387459101744816031e-05) , XLF(4.22295610849360418239076473203457330441457425584409711393838378e-06) , XLF(1.877646346892378638410384442130473876505622537368569114600860633e-07) , XLF(3.348635775124742293641103164912688722212531736341260150783960308e-09) , XLF(1.147074559772972471385169797868210566623265035963448661861360274e-11) , XLF(7.001187498614333931794249254566714884175201635519095836556154016e-16) , } ; const xlf_type K15[] = { XLF(0.3422402613553407204200854994505788156581807375458169108962759756) , XLF(0.3126666062517608097457825027889512016496143480270464397448524433) , XLF(0.2381231949107073115009622946130882638819146755654692162628670565) , XLF(0.1506119498039969868420469988136985491482845980200477554974909472) , XLF(0.07859525386674857574328473799373270272741172212071682971153870625) , XLF(0.03350380257164983552592802293116446555599994753433906873060312214) , XLF(0.01150227448749687506301262915019528776142532756289370045983802598) , XLF(0.003117493948498863912897025599571895868192164488460784757081053377) , XLF(0.0006485490063532391574719617047659375701703744031786359828687871016) , XLF(9.944024943894646248956299799553767807736061704315672569640823609e-05) , XLF(1.057200426826749578072329395080717832040583363334686086008837332e-05) , XLF(7.068397902798796317960711148541836372524203212033899864587695276e-07) , XLF(2.504599065445626292187138747985308831869678430524991371552218113e-08) , XLF(3.348642542939324287497237232686968136703586439036174485909935645e-10) , XLF(7.647163731819816475901131985788070444155100239756324412409068494e-13) , XLF(2.333729166204777977264749751522238294725067211839698612185384672e-17) , } ; const xlf_type K16[] = { XLF(0.332208269142495860354893909213260651752715244778736842228905721) , XLF(0.3050644278149432229293340404451515562626673737784848895960007071) , XLF(0.2359883168766360905058009537279543893300507057120813734570348327) , XLF(0.1533009314401523086212239651393090546529699968853408006847160286) , XLF(0.08317297504551898046847185831664171611261558351505441452531399621) , XLF(0.0373810339101848175095529592884090238587593084947582302079656577) , XLF(0.01375763090948818939943249503228008519013810019101024392029683035) , XLF(0.004080872532107702825295417888010480603073195665788258380850973444) , XLF(0.0009544828678894823993222595420636095239269842444445619048793651968) , XLF(0.0001707270050562771296898281025265152249279233406217533201660185787) , XLF(2.234895063781818710891984081468208452335436462420589404716388843e-05) , XLF(2.004166042122804688942255079821217387354953492519630085767651905e-06) , XLF(1.107471879616850687316905488201784498080794377090673386969683266e-07) , XLF(3.131465753406890973028539166105303671441237578803716369853935992e-09) , XLF(3.139354644805746280407325380870354415327960301505275050248595222e-11) , XLF(4.779477332387385297438207491117544027596937649847702757755667809e-14) , XLF(7.2929036443899311789523429735069946710158350369990581630793271e-19) , } ; const xlf_type K17[] = { XLF(0.3230093941569987066310595722360428242781183957654545889840007487) , XLF(0.2979799587081916278130416052193107199137388991108256592227958552) , XLF(0.2337367492306511099792301161737825292135905270047815955067744672) , XLF(0.155484036150159998453674215604964467009284936455804803143738718) , XLF(0.08731164077018230311492060636162721441277308068468977276944908442) , XLF(0.04110806430911691477285533952525836084815232029673361761752673673) , XLF(0.01607392199096478464475663355215175943467259993870638128341209574) , XLF(0.005152823873576680687553350997255034564402756123889021492507331287) , XLF(0.001330812572133536283084991457229241231731116573942593239200771102) , XLF(0.0002704049258301891954659168111932029999256890013192534200937562282) , XLF(4.182154969498986967085317536609164371380368890484047658028361421e-05) , XLF(4.695715379871064022225232694663522550232789884206006210363508838e-06) , XLF(3.56439418392324554789540783938543042184498767131820353108868515e-07) , XLF(1.63149746984798566178052651461459429090926134187883331985417264e-08) , XLF(3.684527190109978781002746144358344794074641569194946101264184053e-10) , XLF(2.770019512081012202911548587318895442144741864629819811892641024e-12) , XLF(2.81145725434552076319894558301032001623349273520453103397392224e-15) , XLF(2.14497166011468564086833616867852784441642206970560534208215503e-20) , } ; const xlf_type K18[] = { XLF(0.3145344008586467182470994721759390932422799490614270847351734027) , XLF(0.2913584466510833033039909617701228172577206017742726253290973006) , XLF(0.2314117793664616011208018282342371123813484708459982520223900727) , XLF(0.1572401134620674563387759777426755329027348946427689035843408704) , XLF(0.0910485005933913615716885011987613695735215948239429236766443254) , XLF(0.04467047496015852986341799150873397856141570911429472137706048554) , XLF(0.01842292869049824747877647746554818208711247408207673795390363499) , XLF(0.006319116410195815530480260569827260701141117505193999989449033264) , XLF(0.001777277655743294328907405968667273922400664455511085521114223953) , XLF(0.0004021980309109744217120500967373049539249352507989296037848792284) , XLF(7.138389106911010043993841802455747648022088662542374891512077251e-05) , XLF(9.590710558658019278069767401302484471554568729241067312098193587e-06) , XLF(9.271047742986201030757942245048661689392055612617911611583154274e-07) , XLF(5.973408326006386835656815253592235990133420710589849850492379109e-08) , XLF(2.268507892674943235881177967614201058195940099318734723513919143e-09) , XLF(4.094184626640664611408464505258778523640023795641598318224524262e-11) , XLF(2.308349801939754902465172240520651489977491983232551064622164667e-13) , XLF(1.561920696858622646221636435005733342351940408446961685541067911e-16) , XLF(5.95825461142968233574537824632924401226783908251557039467265286e-22) , } ; const xlf_type K19[] = { XLF(0.3066931017379824245305168018632871760607585281834448687674708427) , XLF(0.2851526574476310860587296937963637403055693926298178395378548994) , XLF(0.2290456456811837762874976005842387482762931187915216614415592252) , XLF(0.1586346253388907509263807001987774318598400180350699349721960716) , XLF(0.09441929511676010574059072210769536873467724560598357290523435734) , XLF(0.04806054542535070027749314354263487094159567827397532378940146726) , XLF(0.02078114937124501636527206989309959152867076270355710786854004375) , XLF(0.007565383412672267476436688637003756922033686930678865854495560569) , XLF(0.00229186688915413342562369814596241229272609466012363233629907941) , XLF(0.0005689522908994850139668336783697942095877293348094534065998675234) , XLF(0.0001134132006807759156700682628792664229841049886806981067687299498) , XLF(1.766303335855916124091566797843863525155535136915603332189459818e-05) , XLF(2.069399345620689421367587370830773454981032748982033959182713421e-06) , XLF(1.727524784354898381135431169596139951972343408450014835258991401e-07) , XLF(9.468329535090553581770731398613740634369548703313398275203660126e-09) , XLF(2.98700491781092245304329419497705988715647939961447040691994034e-10) , XLF(4.309815999477244092049078371092556883832458389766693205027603114e-12) , XLF(1.822381480598601402708885037931828837924869579896762959721736475e-14) , XLF(8.220635246624329716955981236872280749220738991826114134426673217e-18) , XLF(1.5679617398499164041435205911392747400704839690830448407033297e-23) , } ; const xlf_type K20[] = { XLF(0.2994102903200126403616661784861819273208478622613087315147476444) , XLF(0.2793216559936422892233820812529748995823404794270593481765929952) , XLF(0.2266624218574869476605655071675697999645534718693286863894938184) , XLF(0.1597221176265887627737283535275246579651382508197748913918422658) , XLF(0.09745755665987275681714501229862720287892930658613929973155924745) , XLF(0.05127546513801330293366313427359288198750689399970554927664477137) , XLF(0.02312933833806029315065469365846561252547111803094898087281005361) , XLF(0.008877709102343649125781813164758545639943033208331530389398370899) , XLF(0.002871240020020613564967878223844321801612052020207485122495961139) , XLF(0.0007726199672568219644396606409481735050559071395476132809662712852) , XLF(0.0001701507308502417288025889042349833499565809643836956126394318825) , XLF(3.000881964669053045461113561648122451001107336936015385952865818e-05) , XLF(4.116703300385090395649698366640374477637672389257233721335151381e-06) , XLF(4.219279492289648548180225958549763705354206661942231803704657954e-07) , XLF(3.04930466565191773845113337099929418996422015953415257776378443e-08) , XLF(1.424128264663112556948453880326044396350881518047799765165073862e-09) , XLF(3.735441850133206772601623462381411800620679753879492514387668031e-11) , XLF(4.309894095512087023181389452914307170499575986787115553111880168e-13) , XLF(1.366786125736578015568348718848197392051580900609790918350989553e-15) , XLF(4.110317623312164858477990618436140374610369495913057067213336609e-19) , XLF(3.91990434962479101035880147784818685017620992270761210175832425e-25) , } ; const xlf_type K21[] = { XLF(0.2926226872314347791864002755983546567053090736854907457088117093) , XLF(0.2738298047486301248520951526919982350695841657257629321135251989) , XLF(0.2242800938788327640723124311838116614954792287670424846889541496) , XLF(0.1605482126616445458566516653249882445547559197659697032068896127) , XLF(0.1001942907349272287191036155859252089297063752082926744344267819) , XLF(0.05431596662727297097521766518702938064594091425714263283338996795) , XLF(0.0254519832636627386300119265950172711837741722340136104602254935) , XLF(0.01024300084884529025342592555639213143163848283452980432486984946) , XLF(0.003511107772631327302241314369882496288354278073882614854299174551) , XLF(0.001014304593252987379463396974791920649276444908537901680577854361) , XLF(0.0002436124246613323939957972388358447915474104767235924832166707734) , XLF(4.779783924441349999725995220676069541153413802538429647232882627e-05) , XLF(7.486517779540240704998193438663843849959881817719169683997200449e-06) , XLF(9.075615794391424945072662376098362682196792514083823070055639757e-07) , XLF(8.158790979427597358557916920313210863219040629055967225023021481e-08) , XLF(5.115081906671036385653379444072183706279168158960145063661627315e-09) , XLF(2.038368377509909826756860133780319225084092261199188007164825789e-10) , XLF(4.448223742037239646968769969502515945936285698788057322980930717e-12) , XLF(4.104700189226971566504737364590944316098401990935309222554811715e-14) , XLF(9.762758079241290190873069759561138655553116160837698662271853038e-17) , XLF(1.957294106339126123084757437350543035528747379006217651053969814e-20) , XLF(9.333105594344740500854289232971873452800499815970505004186486309e-27) , } ; const xlf_type K22[] = { XLF(0.2862766140553860396180994778143617912091107187132976108459581625) , XLF(0.2686459402768988972690069184448927800774613888974958190483808453) , XLF(0.2219120730968715060979611327194697511438673971452332762399152158) , XLF(0.1611512144701082551850668779297792877745897802444649824789586209) , XLF(0.1026578895342640133547104227551038619086537991021880783507370858) , XLF(0.05718529010480106360418634148198039440637253582503748256149990664) , XLF(0.02773678312000349827109209430261071096587526237665429735429216112) , XLF(0.01164920375903508266244159674128980062706852495365218665892234392) , XLF(0.004206555798261862785562595217373443445503341697332318567521846069) , XLF(0.001294343327409118609982816200479590485601750642943444714161861556) , XLF(0.0003355292819853291234757667976163198042994184174052364461613399537) , XLF(7.222478864637174799817057824874336794244458414739912654218077492e-05) , XLF(1.267138379474814743812758619524345270260373391087546913501845064e-05) , XLF(1.768235057909007774978699647226163509326310745537541932749552096e-06) , XLF(1.899389146704343362916947994981635729797600564672037756910353634e-07) , XLF(1.501020632247148740968885050303232004801915456629623391884245583e-08) , XLF(8.177057743781069783137599576662700924055839148857670382097192615e-10) , XLF(2.783324787685537919728904489122790710856578903445197563608903178e-11) , XLF(5.055709418408792537232297228662272681029592107174143198035496601e-13) , XLF(3.731564309831898297571845231403669542545704584347989807127407075e-15) , XLF(6.656425972240051051886725780081104470099637347626192089883802417e-18) , XLF(8.896791392450573286748897442502468343312488086391898413881680971e-22) , XLF(2.121160362351077386557793007493607602909204503629660228224201434e-28) , } ; const xlf_type K23[] = { XLF(0.2803261985498075449763550453338011618199597101539086807461365343) , XLF(0.2637426945803405774709851371242840212262224230225364555533677855) , XLF(0.219568310050317182059345457216563100953119235697847773119941364) , XLF(0.1615634033143354345355296782934058488164101720828038460110461054) , XLF(0.1048741828768824974917077200019790608450049262606597643761123871) , XLF(0.05988840460067647181741671232875399987553923460575104591170602834) , XLF(0.02997415944907547010323047932422480691579916072739652795379714429) , XLF(0.01308540310404733080350013162789040941805397941348749849165530364) , XLF(0.004952309709166372133011210136434428382007661300709243856433414621) , XLF(0.001612408766944430496916070273520439244009368113790382200785911076) , XLF(0.0004473141173413978254716788362428595936623396709178216587108740898) , XLF(0.0001044647630135970383808572583879495282713819589814938091265883733) , XLF(2.022508534437359251907130709315352829663222166539437017576419945e-05) , XLF(3.182890469239906353526003385415435476715047719759058699690679684e-06) , XLF(3.967986612900868319859820172038448533714746810629277438350333539e-07) , XLF(3.785523385292728692998890943114036283525344088881258468933689831e-08) , XLF(2.634673489018393792291355778693645144967301480255390225569200172e-09) , XLF(1.248841049839614108173689591804653470615501898226039837952169645e-10) , XLF(3.63383071656837423722244101841492166896047073476640150844420743e-12) , XLF(5.495958555480875197416042920738434408710892039115259837560028922e-14) , XLF(3.244847040262982602784748395818135908922506285291204207144054213e-16) , XLF(4.341147375275081475269570778207256944352205707819564464344538756e-19) , XLF(3.868170170630684037716911931522812323179342646257347136470296074e-23) , XLF(4.611218179024081275125636972812190441106966312238391800487394422e-30) , } ; const xlf_type K24[] = { XLF(0.2747319735211881015322761845044625221106483573151421412014247766) , XLF(0.2590959338854922461036563119925389272595861077466218841463698393) , XLF(0.2172561221840602086310824889120935501301289661615425483338688315) , XLF(0.1618120821179101653140295045587458734297658566802317585660386843) , XLF(0.1068665667295971209989834979519872671153177378605346062014883966) , XLF(0.06243142585437320943606393291179844634112652806323106512783670979) , XLF(0.03215681632579833790498795030647768103322747540265379844924435315) , XLF(0.01454184959951421604405223650420725484809645947324652722029620552) , XLF(0.005742944626624392292869428264589821982483891620507710145501567236) , XLF(0.001967617402838947504046175887816835006180044367136538293731340687) , XLF(0.0005800499627008823707723137113830171281174768806501670069004012335) , XLF(0.0001456354315661878934852098075573658610406565702667437614224947591) , XLF(3.074601805288829237565846842299406208680046317480218105279591668e-05) , XLF(5.3704036096147168717567292035747592497438062055234103411437846e-06) , XLF(7.601697767063152933136584314038388415508418565002778437392003242e-07) , XLF(8.486194900961675149065571673304571540652997645498943848693961555e-08) , XLF(7.204528187097666671112756504305465515558427620459167198742416257e-09) , XLF(4.422918500467296261727955990067205806215573252688189722480887101e-10) , XLF(1.826149993888722191784315859831614492138743775698015517284728258e-11) , XLF(4.545262838830708863913937292381485377450434049219518822745141337e-13) , XLF(5.725363811192343702863351231362992772745712226696837081626895761e-15) , XLF(2.704042907215565690059493141889047633463898284612706169244384294e-17) , XLF(2.713217109998441035406256361235587543807988881412284908288679253e-20) , XLF(1.611737571096118349048713304801171801324726102607227973529290031e-24) , XLF(9.606704539633502656511743693358730085639513150496649584348738378e-32) , } ; const xlf_type K25[] = { XLF(0.2694597712409119359478025644722404843499695520564867595122246328) , XLF(0.2546842927399865634269226362647617781255938203848040467010015573) , XLF(0.214980814251065970705611552309316374205270211459308289187239186) , XLF(0.1619204249078340261497083737117156479867436123934196438304429325) , XLF(0.1086561716445043983998113417529272521137728937772406128457590469) , XLF(0.06482118414842576032066559832985081442903448465106988964315662552) , XLF(0.03427935408543838204261900452741202163923234528817020347212425545) , XLF(0.01600993365737546731718924377109080130106580912254353210674388857) , XLF(0.006573045689755602078610204745230059549517155580021595839043945655) , XLF(0.002358636146942911039116225207928651463726756967427738354300813724) , XLF(0.0007344950796361184846041273455426906219268869280757477622182252465) , XLF(0.0001967670421694020475896813800479207443794754069444037160491893484) , XLF(4.485942758183979499839386031077925812118913239088604505756780885e-05) , XLF(8.586897119581881946455855366473450438977777073518783791843514591e-06) , XLF(1.356786425515225450414139582244378792263737470489610032664059996e-06) , XLF(1.731450639888094511993173152423750268639431190788376427255928527e-07) , XLF(1.734391495596260318411629164977478836902810344422369563407631763e-08) , XLF(1.312519963625023011649641287169535677432910168153653029222283589e-09) , XLF(7.116667913729384256767172232280486361266481023934235321609576606e-11) , XLF(2.561762818874284177830619219872486982489712426981248877572083767e-12) , XLF(5.456803126071488957131579484548320376763307645485266276969874438e-14) , XLF(5.725618853600683556320679419460948917974830177651689836408274897e-16) , XLF(2.163235873040520804321209600276010715896047899427223438334362024e-18) , XLF(1.627930266093210325732162142775167614479708883686638173848074718e-21) , XLF(6.446950284384473396194853219204687205298904410428911894117160124e-26) , XLF(1.921340907926700531302348738671746017127902630099329916869747676e-33) , } ; const xlf_type K26[] = { XLF(0.2644798424607552774048811991980218465150397365534503561895016172) , XLF(0.2504887856017991829692305796334444436701302745063324585492411089) , XLF(0.2127461469389779008464347835434384277837578897890244392039977688) , XLF(0.1619081671415522163255580290002194335449936901603964870358944591) , XLF(0.1102620477207957917573831950691314452114456174852272766738085161) , XLF(0.06706490590217548057999341660464585074920074446770653834066683091) , XLF(0.03633793611173714208079274248793968357586338228300777388516995113) , XLF(0.01748212759923957906779090484396273956669673452854112169173037041) , XLF(0.007437327012175687796070452190927462037513208984390345239012737754) , XLF(0.002783781640048098131937159035647652582320706343290685471973983282) , XLF(0.0009110999049672569889575776845492335081806818942925863368641122921) , XLF(0.0002587803754286116916077885427017819569122188053858775751476765515) , XLF(6.319989654239854057224998192329376409254354634679400593499494798e-05) , XLF(1.312122006089165669219383899309009164206148754407671215600984084e-05) , XLF(2.283242503425108707751887109405619842464091727653394858571442243e-06) , XLF(3.271131064294833603596646006965978195896928844333214540115370249e-07) , XLF(3.771219350601499909025817390412337172371133552111194519688614565e-08) , XLF(3.395586220032056940656547543845217263254582424860923459274449807e-09) , XLF(2.293838269129721133240221047965400579059274725499429512017556937e-10) , XLF(1.099699158723737741272412125279482954374287698907042400415715905e-11) , XLF(3.453701950140752395678661871611322180258167055904558670237635628e-13) , XLF(6.298308132426832354355811291032932136926182096697855448908317608e-15) , XLF(5.50555614496803912589673189552975085038562422030989531437433847e-17) , XLF(1.664028214545620271061833942025125474216883509069442109191037529e-19) , XLF(9.391905381495421852886054145042158902137280188333533760921711497e-23) , XLF(2.479596263224797460074943545847956617422655542472658420814292355e-27) , XLF(3.694886361397501021735286035907203879092120442498711378595668607e-35) , } ; const xlf_type K27[] = { XLF(0.2597661480314954490051280085087572008430980624510114384955092981) , XLF(0.2464924816235324373051555759464130418821305868671271917413422029) , XLF(0.2105546933313582453141988137125210841069110711477136765436500482) , XLF(0.1617921694966503160303657154764636202990511367919936399442223744) , XLF(0.111701351856793862414318185691739511649153192348446757699311215) , XLF(0.06916998128269683146455507386466727403456927439813859077237415293) , XLF(0.03833000496707419001971751722216559003240935837518411075950931248) , XLF(0.01895190932912276030665024849868878752603876559994120990655582954) , XLF(0.008330716500491168779732515225232783042545928128320131681956866784) , XLF(0.003241109439169010679940166036221174416952522236893375074170632817) , XLF(0.001110031921984315012147496801191212237859944977479549747243396445) , XLF(0.0003324721175851279980480846283767671562718664909351654436695317025) , XLF(8.639508906098987654467352301763175881996222940624277201117071219e-05) , XLF(1.928909982919273269632550185805623842461446214880924589308061762e-05) , XLF(3.656219098565212867144808502454044285400517162282842060233746793e-06) , XLF(5.796995715793529289195487759287104247435699979055916074980528412e-07) , XLF(7.545857538583723758712709815348821101221326370173589037678740909e-08) , XLF(7.873266681060161760970757549281507251233174574018126480922714969e-09) , XLF(6.381800706172868290287182410385321257695925294464161869802912479e-10) , XLF(3.853123780337797326251645655491543954445482663170811086233181724e-11) , XLF(1.634782434967694578792630536376448464676212456540831279533423549e-12) , XLF(4.482016828906594474345838349549509452032233138891005403463682158e-14) , XLF(6.999660914006244856942625354797725638690403147728237019258492748e-16) , XLF(5.097825872595305906666325308594472481857801870896126218906203584e-18) , XLF(1.232613731031951474285558334642804766260180573344194541212320174e-20) , XLF(5.217725211978165654780042590607119790058318957517126604855104851e-24) , XLF(9.183689863795546148425716836473913397861687194343179336349230946e-29) , XLF(6.842382150736113003213492659087414590911334152775391441843830754e-37) , } ; const xlf_type K28[] = { XLF(0.2552957845386585957803397036587849362350638221123817343035329959) , XLF(0.2426802311575467844923133687432291811931085336260666173961386035) , XLF(0.2084081117541345643603667451843486492872876979636470895586988051) , XLF(0.1615868801077271067040598410475216528491533929462652113663010738) , XLF(0.1129895289240223810551689343651001884038761513381859731088197573) , XLF(0.07114379666863527068953628646706032751217133573426753483226403184) , XLF(0.04025404278609506668602723922038312261082946619919874883025106342) , XLF(0.02041367684452346886614564155350041375436891601559955673677950141) , XLF(0.009248413306443417171382881460940142704893882149353933674678318474) , XLF(0.003728491840093345465290271124105745105171365566261170072042614296) , XLF(0.001331205213036271336477472414194575589583941350847242477006317142) , XLF(0.0004185067528242368800078907739669123327549553351943143987647850803) , XLF(0.0001150514123561442501315497189033097755719807731071248504285418619) , XLF(2.742647548420673152026532723902103573403959627349578069122408501e-05) , XLF(5.611849625282199399793270093496037939878400752537127277118313152e-06) , XLF(9.733436553016039034618019884483946657168682834792172860687696625e-07) , XLF(1.408998469853445599142467531569804169280696933679197605199323055e-07) , XLF(1.66939140692578936639436824451719823202135789424172469486168269e-08) , XLF(1.578873244078938287750296544646028668252755306533744151797847501e-09) , XLF(1.153619689830592433940933692166319782857211337924664602003663716e-10) , XLF(6.231738008510106975983528031884253848778379177204099828726492132e-12) , XLF(2.341653175769926402611742396033138633143419940725944896672085867e-13) , XLF(5.607163698981499532240154768914356781050554778889037154993249419e-15) , XLF(7.50078126347122058737893518430830293159410211166978784413665367e-17) , XLF(4.551679625430850781448451765219582961899621865247389196828763436e-19) , XLF(8.804384678655461682324509559928443276128653174835511891751722965e-22) , XLF(2.79520993499502322610163864166768456782141743800024859864385343e-25) , XLF(3.279889237069837910152041727312111927807745426551135477267582481e-30) , XLF(1.221853955488591607716695117694181176948452527281319900329255492e-38) , } ; const xlf_type K29[] = { XLF(0.2510485149905656391299793469757543253721812416821378800649709691) , XLF(0.2390384347586081167724348363244753668745305082436574150193468938) , XLF(0.2063073557906651272468049868677777580099990951001784836347157803) , XLF(0.1613047150319719278025944302418227542773220834341139401840675815) , XLF(0.1141404822333535168577549932261312777498588557166895117210483621) , XLF(0.07299361587265322173086511606622265519786095000735526787550484744) , XLF(0.04210937045603605820224334432992177164567919024455180824997239336) , XLF(0.02186265993825160291585850200213098448500260969151149601825641797) , XLF(0.01018592380177695590924542098091426580332117317001921915425462714) , XLF(0.004243684185236661081407964393678128358561124789990949375370535456) , XLF(0.001574312188186834215622776783461920610579750426430965127438684295) , XLF(0.0005174139130050184422149226611160799482852775407498228608739696901) , XLF(0.0001497419572616025269653996150938500946623941220169738270137507417) , XLF(3.788247489554460045564579026511372096790421336517618845174653651e-05) , XLF(8.304322906561694415113181436483895466140961118086518131838124573e-06) , XLF(1.560659267130577520569657849905253238964468060165746523796062949e-06) , XLF(2.48185021058764099258735124668197181151137596552762678699985333e-07) , XLF(3.28604319531239548001633056661915304736687157446559114119589449e-08) , XLF(3.549385230039337373280011570076933151728923478121471611603784632e-09) , XLF(3.047062215049192483978699114598042032650077405440376993342055281e-10) , XLF(2.009186130533520803368952420942216337970566271772921252563834733e-11) , XLF(9.719242659523968837930344858650502902558446704924179198869497398e-13) , XLF(3.236591287711976132404880970900605558688215320135903805201068541e-14) , XLF(6.771425826704358661176337961509682975353784732432598490982058341e-16) , XLF(7.760248611819620254065873507071065336023055541850541076076115164e-18) , XLF(3.923888252362185971296499037279863195198161983482075172037369412e-20) , XLF(6.071989750234451635932287573251231103327119357186118350972889505e-23) , XLF(1.445798242240005904457103602959834287178467094260233099196045485e-26) , XLF(1.130996288644771693155876457693831699244050147086598440437097407e-31) , XLF(2.106644750842399323649474340852036511980090564278137759188371538e-40) , } ; const xlf_type K30[] = { XLF(0.2470063825838950539981826642019578791036815251851126621866584569) , XLF(0.2355548472503042207632856664840253002914234715611641313043339007) , XLF(0.2042528367342451500647704408564989421752168915398637510272548352) , XLF(0.1609563726345440522679034900331040111295862626195406823381281027) , XLF(0.115166731023435080187505477980783454956880491716308512676874906) , XLF(0.07472649790807482555970650366260995001234635162396320669577105959) , XLF(0.04389598024219399751930654126224021343152700514616345082605181085) , XLF(0.02329483325687316202341943768654774366704175243883289543101688787) , XLF(0.01113908103006694415399426929004099415231573149524019056765580827) , XLF(0.004784380186109439977138505548643177868270930446627690441519152815) , XLF(0.001838855638719593980672984689549999556824666662675877902976900031) , XLF(0.0006295900994787331740833719917196686031235926942222033880558536176) , XLF(0.0001909968461267294703612716590434234836506692408158201743338420864) , XLF(5.101242397662600066070288424824321840722160809168892806747318052e-05) , XLF(1.190386233741890326952688396254449420346287816525756328750856589e-05) , XLF(2.404761291228170986795211978641323296520128455513522888826821975e-06) , XLF(4.15905488479258144735875718581496675278824175708083761555050244e-07) , XLF(6.074934643107642639232887167864088878998191131147513565261293873e-08) , XLF(7.36860367073920759289364332203367170196131139945307826038070591e-09) , XLF(7.266202637623134813507435841899050998288727478890686660921222158e-10) , XLF(5.668894290196139955750022974728273001196888706965880010984158742e-11) , XLF(3.376694008716038465429096336651499144792908319062713253918133816e-12) , XLF(1.463867825075517509173590827330242322011996182252275675164125428e-13) , XLF(4.322439274033272501435167247357171370386707843502203893903226559e-15) , XLF(7.903593695386417108512749412545470012508347169656223475053435653e-17) , XLF(7.7608153308629754695518938539112354509260327396572117804555393e-19) , XLF(3.269920612051789589469841540217447447757529848339124782245583539e-21) , XLF(4.047993276152608992949455387235544979288477165048926452353108912e-24) , XLF(7.228991211202101056290513040800760085660839973871279251701767593e-28) , XLF(3.769987628815905643852921525646105664146833823621994801456991357e-33) , XLF(3.511074584737332206082457234753394186633484273796896265313952563e-42) , } ; const xlf_type K31[] = { XLF(0.2431533907099914536911335912093164390105016480631371677980220911) , XLF(0.2322184108440457520335659215891154732902626564766773598659003504) , XLF(0.2022445497916713612904401396071983126959457592886985022359092341) , XLF(0.1605510943727957606200166857123530684930985010439813987658227731) , XLF(0.1160795541686762448975924492384011372914981292227389442986345971) , XLF(0.07634924204183750118188693439006518033552052365723189552966999136) , XLF(0.04561439692534073582971464076317214451078373765965634516884982723) , XLF(0.02470683332624936709658774248173555012465304285432989843209470635) , XLF(0.01210405170337634657496139881327214456031998832898543722588986164) , XLF(0.005348257175468436592733234934556336247307631573552183501260265991) , XLF(0.002124179810847007625944334608425419265050828435211963419544632482) , XLF(0.0007553037547832393680591509648293096853273673753431242493366146834) , XLF(0.0002392959458797775404957801087309044140708000748464254934852563273) , XLF(6.717119163857597716117534355063787787933137246672860742388524365e-05) , XLF(1.659423630606070769814889786267546456403088475167778642097663e-05) , XLF(3.579245446687846586847470744138495890273768791413846978721499703e-06) , XLF(6.67615311102296842831535120948966700100664306954309237621847946e-07) , XLF(1.064458694686305377917729726048725813319107563579751170658775524e-07) , XLF(1.430357843908363941226034488242689835566436122185653762001914924e-08) , XLF(1.591661029154995646533062498444341150904959055378106712750824158e-09) , XLF(1.434682460258225125227618712868506315076726485041949206824066056e-10) , XLF(1.018285276442702641824391223556387603932776350858361068204237623e-11) , XLF(5.483927710960996995059429419011657855149994421641330895072921354e-13) , XLF(2.131980675968432722041796381893738868254316378984450049288697217e-14) , XLF(5.584350767438550431308198158780528045422394924412856699547800563e-16) , XLF(8.926418454063486283911165158806053151292040017375660860747233774e-18) , XLF(7.510845132464196988794721581869678922845836089467580493560580742e-20) , XLF(2.637039630855982909636361260502286768269702361395398347060564958e-22) , XLF(2.611608601743498983992283900018947009019394920807376649472923183e-25) , XLF(3.497898973162652407156165950814192025144470885817368416211664337e-29) , XLF(1.216125041553517949629974685692292149724785104394191871437739147e-34) , XLF(5.663023523769890654971705217344184171989490764188542363409600907e-44) , } ; const xlf_type K32[] = { XLF(0.2394752361829221817846148566387753318305833644915735273617097364) , XLF(0.2290191124318211375311131197709823231249720086536896632118378261) , XLF(0.2002821724877073427303301182686708917071342169893179236351048113) , XLF(0.1600968819347263383597895681374748488026922723543680790910644143) , XLF(0.1168891201931103844017234446576664165424080610248912609188539078) , XLF(0.07786835312321691904158976449493395836830827577036924307465722115) , XLF(0.04726556304458340388559170602061634881125816890725198778363723928) , XLF(0.02609588106552989054017244706833377405770588612985128849102640583) , XLF(0.01307733402110062621432431553687810217587404634177948413053185257) , XLF(0.00593301238963447872013649414175623560280960178397738995378486318) , XLF(0.002429499632678958350928248359100875783705505052053977550788446081) , XLF(0.0008947027727710059305092138463987346570348973260610299522395811407) , XLF(0.0002950637886591241515858603237686594983819472913001311964084960049) , XLF(8.670709043241161506297580431720561955252222180477637570421955511e-05) , XLF(2.256995589261363055694004144432232791484630015725459920396453305e-05) , XLF(5.167840444406296672228028864589199433709184566625109359116063148e-06) , XLF(1.03223469058837950512818559856239354093621400805587780272646086e-06) , XLF(1.780784984311083039987121746766376893655288589431527294211019469e-07) , XLF(2.621660553931816945077782463271484560429397276865484684903836435e-08) , XLF(3.245475733445526912856694463451450153401563044190775977433644264e-09) , XLF(3.317388214926496219546365598165831311226381122792237286235774999e-10) , XLF(2.736300253045405521947599022351360203921690237386253492222301629e-11) , XLF(1.768499527819998884340719103144440512658519884419705703587108157e-12) , XLF(8.617500117591645121054305201095045019069312239403192019672482871e-14) , XLF(3.006047917016258807842884543507631924570501881087358079766333467e-15) , XLF(6.98724516269948371765634291490922945330182269447215086925703998e-17) , XLF(9.765701205041633482272657912981146179840167400869040449293483398e-19) , XLF(7.041662149991598130024800759618450766450066902149682404273932533e-21) , XLF(2.060190545541195443806482805488083407487478935663968003320338075e-23) , XLF(1.632255387870898205044882574552221648281202025963466104739310744e-26) , XLF(1.639640143670049061242264899305287389509703959539834466648927639e-30) , XLF(3.800390754854743592593670892788412967889953451231849598242934836e-36) , XLF(8.848474255890454148393289402100287768733579319044597442827501418e-46) , } ; const xlf_type K33[] = { XLF(0.2359590855358157174562983658246484541287590392189529863394692754) , XLF(0.2259478610773664999159673395224707145478628579219402080328043893) , XLF(0.1983651416283094633628491558427352863166064557829593950008655963) , XLF(0.1596006787009698764773333502827644705301554391604520043437249051) , XLF(0.1176046041927883799147590620444202406375224643925011152964985884) , XLF(0.07929002188392354392709176408903581338437730844097991045770629785) , XLF(0.04885074439501923264805168619078033025183132070855136887471906705) , XLF(0.02745971056164966029845868181164611525931524940164791696227290024) , XLF(0.01405574891012765545894267238985515183058115627732096786099470917) , XLF(0.006536391435344104006788220725201366865093755693678911181018174177) , XLF(0.002753927565956844700389110130965060293590392805487010254384519492) , XLF(0.001047823667292070134738082224680451825036579459177990719795705759) , XLF(0.0003586664721341373449105998730206609235482388390357935186615891611) , XLF(0.000109956477102946179083018920409469679969979836737783655866132563) , XLF(3.003330530272325062918881017946051460406795901959898126245397172e-05) , XLF(7.263767603007417823229422451086083543756881657311748442559435002e-06) , XLF(1.544318983649526595455153459036931428409723714339539468714306576e-06) , XLF(2.86136918522843609532860973209532535177797435578050300540163503e-07) , XLF(4.572758656116515550678701374129815814886883365739945487218455533e-08) , XLF(6.22471576710428745038277435092426651536226381031244788413068292e-09) , XLF(7.108221576497256933612943327331823736021150810130218268162282208e-10) , XLF(6.681624474900825794013658830740189292106839474656048917884543101e-11) , XLF(5.048209551989393695477307478708734086080024899097215251480193726e-12) , XLF(2.97346053595301734542347796266193646347639959633920335606134405e-13) , XLF(1.311822142353833655547986665151745234833100425913807650714268743e-14) , XLF(4.107886195493344683050207511099284255090238472341113644931489811e-16) , XLF(8.475789587044821110610741230883567625911103326846985613406358096e-18) , XLF(1.035946599963927813153270074787598493449497657993800505301130876e-19) , XLF(6.401664378528798293490292785591958632745808448205180571913414177e-22) , XLF(1.56075197839983065762749997783473979708302251355375532084224701e-24) , XLF(9.892456933039535835227104986073303495410077421741767434607697583e-28) , XLF(7.452909743954855604256754524042161704409432621954545709412661676e-32) , XLF(1.151633562077195028058688149329822111481804076130863514619071162e-37) , XLF(1.340677917559159719453528697287922389202057472582514764064772942e-47) , } ; const xlf_type K34[] = { XLF(0.232593386403171396972319320096661029681623530213761978857298636) , XLF(0.2229963824534834569790693242408311433496417609063373493466634828) , XLF(0.1964927136395438224542390612456443969072827288288345739088289167) , XLF(0.159068521932821489181941431539051445681724415176025632666097727) , XLF(0.118234292547104182116969303797619048653316671223121009448617907) , XLF(0.08062011619977089269654175454358559454704304058995717961593706373) , XLF(0.0503714524656091006782715751891107252026303995625879705695098757) , XLF(0.0287965043718514712267799919664083179990541991689252790978516421) , XLF(0.0150364267188873132404452632367368218705962239510689255700757323) , XLF(0.007156210067138532481612210704392597066368931568408435523217372001) , XLF(0.003096497807525672885778223797656683878735146588308564145827369317) , XLF(0.001214601754428449862870974077972233081314535498931393352406787783) , XLF(0.0004304102760522953458217320651014039255063291159966120350148450428) , XLF(0.000137239132962966729679915589633362426750831124878356007516952238) , XLF(3.919133281925342991296527983385413783843623195149606539520876673e-05) , XLF(9.968853753484431505243145243250836227968314207484655863011687693e-06) , XLF(2.244184813261852393051984355945908038609535598579467437296729466e-06) , XLF(4.437584736304715706550836320178274990458265618867129719148693418e-07) , XLF(7.638584633154218524895475860530940034318759392313845363296626096e-08) , XLF(1.132390855134893640514856049747591733171199342489844789122101778e-08) , XLF(1.427141617172937734447781288540751497663128218613934487747010919e-09) , XLF(1.505030873824642022647772040745917991066190077725257298457365579e-10) , XLF(1.302293980576528753074543341740290655611330307947676203609618036e-11) , XLF(9.020495803887933509697870883866591363112278975911263449290676557e-13) , XLF(4.845651644267249725876208798887670421954517592588141562018491537e-14) , XLF(1.936628847214794923844636926309228493248010423396935061885753761e-15) , XLF(5.446201250299691432949149569772882618222436118854065387530408424e-17) , XLF(9.97735397286844815210376024446164869864880156281687603335245999e-19) , XLF(1.066560216690277439027528801249886079443918516197079765141183346e-20) , XLF(5.648620498296544748711060477686891248220376019733246309499455876e-23) , XLF(1.147612461233895126293182743538844060708855946161761893333227377e-25) , XLF(5.819092324730288005588719381789210564324789565406785112215850338e-29) , XLF(3.288048416450684799733932358209954191125149434803487303836565891e-33) , XLF(3.387157535521161847231435733323006210240600223914304454761974007e-39) , XLF(1.971585172881117234490483378364591748826555106738992300095254327e-49) , } ; const xlf_type K35[] = { XLF(0.2293677076664401271784713049334263188739172397893755593279967251) , XLF(0.2201571275539160046405431638495996960632326360592649784947446026) , XLF(0.1946640119552236303743161631810316258541906806786349131299753149) , XLF(0.1585056708491715606221670267030486856488525747007608615355350375) , XLF(0.1187856764263017623097685141570438577670986129890158204278944259) , XLF(0.08186418028643146819193118572113900345086661699975511551795800029) , XLF(0.05182938099444126532030017569852453131959382261133681586554115543) , XLF(0.03010483529164026270012888570948420530445598881730021351470061263) , XLF(0.01601679093379909462201110065789581678419643665342684253944871924) , XLF(0.007790370324980181962172031118377785214988280616467805385061091304) , XLF(0.003456187749275864831628318370013289220943726481849477397348055858) , XLF(0.001394881830894281477115385742874044020959659045136498132734027856) , XLF(0.0005105417212643599961934189453390712919369820852644932688368087745) , XLF(0.0001688544522478030145662162315598734720104135004792197715672380794) , XLF(5.025290875513556611743286048709307428452429225282949940747905173e-05) , XLF(1.339244878115252143168460112377397435108924552155671732287881863e-05) , XLF(3.177893081406759311127532196142053635853560933825745597140099423e-06) , XLF(6.669708758220990183740329281293378707395731264419459907312153098e-07) , XLF(1.228449083874475833952789663312084931248189394129335668656226159e-07) , XLF(1.967217395115326749466941238800001626008087355310171583695017196e-08) , XLF(2.708724424500013968702921305539883115105179417623380101869964739e-09) , XLF(3.164204546646098200852744147557605448377911650677027536037520867e-10) , XLF(3.084802998595784065804469297386730596283881895536636494065996765e-11) , XLF(2.459387884929520217685108537394021582548944363040355857012017924e-12) , XLF(1.562970385071201129581160468031209049332802199499618891837422442e-13) , XLF(7.662055194744004023149742130217217211601608874531379310120394546e-15) , XLF(2.775449723825676230998487510943992450630246686236973607180617552e-16) , XLF(7.011857792335530053600154348919240912429555990886714930425367196e-18) , XLF(1.140785470773381187188883324896434081331128898700875729188578235e-19) , XLF(1.066666780704534872146398882504643257820984338534937643031278574e-21) , XLF(4.8417295785532429558963902459429350824599287935015421006877345e-24) , XLF(8.197235013088449504892470770612714720305125406655615407146614939e-27) , XLF(3.325195617421974751985539726904372177699371498751888853068667378e-30) , XLF(1.40916360705029542901992713633411150824439414572764697353427631e-34) , XLF(9.677592958631890992089816380922874886401714925469441299319925734e-41) , XLF(2.816550246973024620700690540520845355466507295341417571564649038e-51) , } ; const xlf_type K36[] = { XLF(0.2262726033193025603250026961787552431761002092831334501195986193) , XLF(0.2174231934762547573979025163678131841933370346973006504825964454) , XLF(0.1928780642708993198150489552243734458683110093806103867666934382) , XLF(0.157916714771245471182187362700511522079533717536586386993540802) , XLF(0.1192655351356437735611207013301014689985786980849315023181796423) , XLF(0.08302743954733596579673888656060324672180689118506461833396319663) , XLF(0.05322635425583203902854734503976569354261767564078219114151144933) , XLF(0.03138361431833886281913185694309933164017302626515118816252747652) , XLF(0.01699454011224549731390943174863947622921800421448274214241759549) , XLF(0.008436871980737005995433520325579085751679328395292049324757982058) , XLF(0.003831936733701335311153721807101114858191493774528418856043732978) , XLF(0.001588428945859191186661394263841957524409889286399751239378003798) , XLF(0.0005992488046180470089115172071984013979514130229429753072936855875) , XLF(0.0002050784270569641418682458015623939544543581202637404642007967107) , XLF(6.342593457692011011165122198874392427979252823085343513212910521e-05) , XLF(1.765020045619630470502992506826992254616554997390538552799755984e-05) , XLF(4.397095566982974972766671788751505118362104606307141684572355592e-06) , XLF(9.748817594591299406374403903434212686918255750668517299103556906e-07) , XLF(1.910335584435182091849089349416308394216157978863551138184076437e-07) , XLF(3.28180127603619067133664581537775879262865410669404651289711821e-08) , XLF(4.895318376187337599364453503292758714960124261272058696993983225e-09) , XLF(6.267885629331358091307407372378115776045489191859073721733015827e-10) , XLF(6.793625979416651380725746701448132563801743324215169860583319942e-11) , XLF(6.128475584914402444307120339476799135774170535723983646114778913e-12) , XLF(4.50547609318932598268313128556465705875610761956542226098244126e-13) , XLF(2.628850347740500761291087755840699461657319678964183203568648277e-14) , XLF(1.176727141431960870111444071616268076459732772878764381276960252e-15) , XLF(3.864931598386980375828467172532402262153841543317582238676978407e-17) , XLF(8.774451871078550729040403481616106947725606652775722961197963593e-19) , XLF(1.267983237181790921925271974546360414972702047584821784902094414e-20) , XLF(1.037113427733002920991891662926963035643493168492612420479337339e-22) , XLF(4.034806053419644781897427290605422221120507821982106629523391065e-25) , XLF(5.692525665093213357152890845911224209290976933277455910624610011e-28) , XLF(1.847330899508640844303400361400050024644930335484549425014344225e-31) , XLF(5.871515029376233731681189944235854475310369731921032363643067532e-36) , XLF(2.688220266286636386691615661367465246222698590408178138699979371e-42) , XLF(3.911875343018089750973181306278951882592371243529746627173123664e-53) , } ; const xlf_type K37[] = { XLF(0.223299496002640021111359341134510837820183981581011478874017971) , XLF(0.2147882544510711919569190407076284917078356366124289559233269325) , XLF(0.1911338318377701366704322310440775069478578641968416067177716772) , XLF(0.1573056647303090663450359755052328491126028641865345288091842659) , XLF(0.1196800103136548186980730211805774069327031868380628530784708314) , XLF(0.0841148093577975973749259321703145040444007714777888350754718989) , XLF(0.05456428507461703877753305507345506063872207667011783872754096296) , XLF(0.03263204441867352147312016385029071527210754235832393678102225306) , XLF(0.01796762892789281297789050869283267424030324181267526910560393879) , XLF(0.009093820131609218671994573589787800121382734184092367680596834041) , XLF(0.004222662227863207362593047213251734624576971362369803251126590269) , XLF(0.001794938962651626443754254485735834725280940135628014853656370057) , XLF(0.0006966631614214483506298451478109539642888517224267642626198369499) , XLF(0.0002461613889037040653848938175862969606358749258560528091091989737) , XLF(7.891476585255874661485750460874987674736761466179528632804046175e-05) , XLF(2.286273730161162440708934743979648688795161747897172320704465225e-05) , XLF(5.958757095231239463582163953790311412500009195669303716636561131e-06) , XLF(1.389808819203188863963109445750963562889095481299993332919349836e-06) , XLF(2.883168095103089944418558035605401391582237540687256843976959461e-07) , XLF(5.281987510634492186050058937268477291572198976666560146322165234e-08) , XLF(8.474026572116705375272692020645806137489480763752851350101794184e-09) , XLF(1.178765563824872479781404055538260761715745148687999298197863157e-09) , XLF(1.404908857030410417939226894077512593218815262042967158586447575e-10) , XLF(1.414226082541827831896255548728600395444153723123816966053366058e-11) , XLF(1.181466829789199252598650799476978669825215212176940062592154956e-12) , XLF(8.015206393319374746358231956413350088107977121792447426077867749e-14) , XLF(4.296426999349023767506226347179792876306173244100341041364744046e-15) , XLF(1.756896282826159479860087487893169396786118506068844647742479667e-16) , XLF(5.23418957804375702150428914535730843363873159031302981413153004e-18) , XLF(1.068102073504933348068514703729203635850250998882033941661537675e-19) , XLF(1.371163454806844964034306222557904730661078152161827312395247462e-21) , XLF(9.811063214056637673871466472915012329695801698454977145693506043e-24) , XLF(3.271481842389301931924819592115923152733077307704881580486499648e-26) , XLF(3.846301688411045627417622124756889385663724613074295840142706519e-29) , XLF(9.985572432392002660729702221646166722654454485430460385339547514e-33) , XLF(2.380343930828203250096293275123389410318035510729455592345974374e-37) , XLF(7.265460179153071315382745030722879043845131325427508482972917218e-44) , XLF(5.286318031105526690504299062539124165665366545310468415098815762e-55) , } ; const xlf_type K38[] = { XLF(0.2204405769366256970084169101999345046475155217864402442370986939) , XLF(0.2122465016001463982769597921694014751984382351046329008674151314) , XLF(0.1894302324820040239809536776565368553527188945566814085571006764) , XLF(0.1566760313070161487559819369364610702802160126090123377780994119) , XLF(0.12003467194665939485786939771061412359013596605360037390219396) , XLF(0.08513090649930807292034127820225861189231157534183318300894487683) , XLF(0.05584514089010038394293372545899290553565874822225800937679702699) , XLF(0.03384957964566861365610554160816927573017178390561622655446173683) , XLF(0.01893424898900674267434828997472419446237266210211030685789132954) , XLF(0.009759429668081820299226072723961361699852524733868534595028874955) , XLF(0.004627273591928908384628961893485554013932438920848044612716714737) , XLF(0.002014048690147923995843389202126266680297339266680167160033373356) , XLF(0.0008028629319168792278420323133127458233554605905344091357011991999) , XLF(0.0002923264476485619992151126852992078508207007992830161278575508514) , XLF(9.691789209971075958155369544692944501260932727351080996978080146e-05) , XLF(2.915430610005127433934232847519307708173451642696354306014928135e-05) , XLF(7.924768986753194064224026455679239399760361230103279829902516351e-06) , XLF(1.937347334574533840625678052404961827208693849112806843319412077e-06) , XLF(4.236410274253766669641293678291219325903573489999973588777629044e-07) , XLF(8.233986488669669474477542379484552025182717060094972455605030243e-08) , XLF(1.412006309587132883758702680733905355710456275446366297888007001e-08) , XLF(2.117920150530041569243997601790652447817415069997488516491460273e-09) , XLF(2.750222696759650938650076951167747261893237013000199528770556179e-10) , XLF(3.054085677231035555741946552752095824577983897925944391603986484e-11) , XLF(2.857677531698228695719000242655664077264882684938181405084657439e-12) , XLF(2.212566703290411454934409168275948437057448428697313513544409605e-13) , XLF(1.386048232855116781112711477638211990175193152792717967895327533e-14) , XLF(6.82928690846568068072289463812514432530187305449536216633021178e-16) , XLF(2.552292361843761159576520112101936843190693471537156236744552547e-17) , XLF(6.899359854653215220078582141671976560272667002299645052441865425e-19) , XLF(1.265748459889749896396263637556466326468388038567136764366364157e-20) , XLF(1.443631273124267397845631204497068288354676101501317357403726348e-22) , XLF(9.036864917446741520986280502909692592017688570218437206565926792e-25) , XLF(2.582758282954911159732641421914675627473535126092313760820233202e-27) , XLF(2.530461865750828637062695157115597074788538499675973951738962501e-30) , XLF(5.255564438808480417827905481574723449125361636074636673970335044e-34) , XLF(9.39609446379553966679306862932193423523228450345089368676370266e-39) , XLF(1.911963205040281925100722376506020801011876664586186442887609794e-45) , XLF(6.955681619875693013821446134919900217980745454355879493551073371e-57) , } ; const xlf_type K39[] = { XLF(0.2176887195898937418225228637634886925112187026714183598640155194) , XLF(0.2097925901575713739300943625860955993681356413653218881549740614) , XLF(0.1877661586628236074650732494072900129398682751737765182005378242) , XLF(0.1560308909684679854638932073698496277234159697295131819795777603) , XLF(0.1203345770900273918424405893668202237608908431918603104154506847) , XLF(0.08608006228445636022381341750674583429178721130612337831452121556) , XLF(0.05707091647022090650605715862631512164008327719590608517704315008) , XLF(0.03503588912636704584373310985844619259497761295043203502250249204) , XLF(0.01989280990678619245331062592578925878151818297699242192697643047) , XLF(0.0104320272392785324839950784871441304489261799011150474181762851) , XLF(0.0050446836480750103688327133567327459616723423762828768699042938) , XLF(0.002245345431970502356079276190897834658912425882331386662686485725) , XLF(0.0009178761359213730236256507345524907088589992439713015273016294646) , XLF(0.0003437685569620565308917592092068127925162273114092964050298815497) , XLF(0.0001176258995931459076143258404543567985096395470309544955069792735) , XLF(3.66514050867204443983482543748153114397699108909416748703738261e-05) , XLF(1.036147175817185229768274172229886810622199038201848995730089524e-05) , XLF(2.646374682802178634539845928806185337847931758786862635176799508e-06) , XLF(6.076583785136942665251524581155153893999646227607286484266408107e-07) , XLF(1.247377602383245949700557859749303172341881221988884698998326298e-07) , XLF(2.27419552149453383211557478462096096398152348233185102834678675e-08) , XLF(3.654584067587039982563063830020067639519685104027061714413099011e-09) , XLF(5.130268696079998755250251755761287624609207855983189592261635454e-10) , XLF(6.224887524344096158378753631118560489854090424681339265936255029e-11) , XLF(6.446335067410568233721688039965402588841603764254024476196436652e-12) , XLF(5.611037734208879018398495653884589609306040490177947148128439037e-13) , XLF(4.029059843849294268462748641279427903112077131199882447769170313e-14) , XLF(2.332003925046160275917548212411228928304011211397220903910402942e-15) , XLF(1.056674350919289574405642041578659311663089974488470793070879336e-16) , XLF(3.61058366974535455105039274668111557072101884122690397960658425e-18) , XLF(8.858288812262929604953207128337764151655320173089941376850886551e-20) , XLF(1.461301578602453365903368332969905489737926269998600765347940926e-21) , XLF(1.480885868071521374205135530487221137601178174012990426299692481e-23) , XLF(8.110243802472972916898592216388787503844668669824849218839263716e-26) , XLF(1.986742126782860787327043648460079156127530036439742245799075204e-28) , XLF(1.622091029930926662817814204494632381165097453948872758655363976e-31) , XLF(2.695161250857309449633620024278902923416156711931740123200877163e-35) , XLF(3.613882486075207632816242386973610867942077679382692031385706149e-40) , XLF(4.90246975651354339769415993975902769490224785791329857150669178e-47) , XLF(8.917540538302170530540315557589615664077878787635742940450094065e-59) , } ; const xlf_type K40[] = { XLF(0.215037404911510658278346721650747989352339032399454935358848413) , XLF(0.2074215930929292648304248878205716030490401958019418519872901174) , XLF(0.1861404915973675869777636007220157795933022088952077557895729011) , XLF(0.1553729427643562790592521176202138792149273944630670168737533431) , XLF(0.1205843221079231459029456557560999846112101923595942469674728979) , XLF(0.08696633666088934757008113142520029563544876382398313866375261941) , XLF(0.05824361211119029703111157220103844065220625255480779858832324588) , XLF(0.03619082544391560073706066797165748196594630259445553935642919706) , XLF(0.02084192094868450755698689338998483434539555053148618099041525259) , XLF(0.01111005124276135846184469621827366973865248717597376681448075581) , XLF(0.005473818268101626589548631485465720144015566211668339872005945125) , XLF(0.002488375856377021353668940486233898696385477899687424818960169327) , XLF(0.001041684388076669555794528095299978538897754976153786533556147602) , XLF(0.0004006541297978540459086476994000625870946564296358498190227813222) , XLF(0.0001412197289718143846998331579899848440890685313381599381709670285) , XLF(4.548144759849271658303381735327516718998824605247315400987889054e-05) , XLF(1.333910605109669115159346971640419187814474853155491302455529031e-05) , XLF(3.54899385187398403253555804882340908918157146610707525729958316e-06) , XLF(8.52826819481390746220621846688289393591494120772903254164456423e-07) , XLF(1.841625205024749270352837368664239393148516973766892610299271236e-07) , XLF(3.553028241409532416634397998880950984910814196902885790594605331e-08) , XLF(6.083084627682534483820831472623902200861022314668324764267243622e-09) , XLF(9.169847053061317531059603444748247282435261743007927038923590235e-10) , XLF(1.20588113715728426580108014936601293624780459771024985406843947e-10) , XLF(1.368378280513567575679972923800506394669605794227480414589997761e-11) , XLF(1.322506757193870324459155284283635797968995389183203925433382986e-12) , XLF(1.071600108036426408260527401382304594019428685803816815344331944e-13) , XLF(7.140672046564404583634289695773184871937497627431314301007072792e-15) , XLF(3.820647662351564131368825482608371630291487006026066015678511571e-16) , XLF(1.59276252908966442501279711860528451112733336688415988831106353e-17) , XLF(4.977521597409959281316682420641641820962825002333230553751516319e-19) , XLF(1.108618898814130569855935513019659017980756081992854363490030542e-20) , XLF(1.644704335674735695419956371130889152814892879764547378507652624e-22) , XLF(1.481069641718248788827963282024703694923119970541361102457428896e-24) , XLF(7.096615398197907326660907359421860437649819313983050773305479692e-27) , XLF(1.490059155490333904939259888284082432053424772678683019502423444e-29) , XLF(1.013806928490448092735007343665478212849190512660543138218013287e-32) , XLF(1.347580625476453804942817060266969521120728875991166978215093243e-36) , XLF(1.355205932278202871112162176688497474386840742888254980046545109e-41) , XLF(1.225617439128385849423539984939756923725561964478324642876672945e-48) , XLF(1.114692567287771316317539444698701958009734848454467867556261758e-60) , } ; const xlf_type K41[] = { XLF(0.2124806563390982712897035436210733494648704444800379947186874374) , XLF(0.2051289602446188198714152490088700742189987233033120225446880449) , XLF(0.1845521122603517911198059596111253025294928437616173514870335585) , XLF(0.1547045569165107007044553041119892038005273756474980925528002872) , XLF(0.1207880891639810803960403965259431830414376112865825734114054577) , XLF(0.08779353277096463852281991654176227001546915838269455696887426559) , XLF(0.05936521635487762619392479017008255825904363441703974245126458241) , XLF(0.0373143969558195276387958878432390525091658636031840953551422071) , XLF(0.0217803735028194920506377746579272911435951054513436180346662255) , XLF(0.01179205028024111835911643521497867391149813468700157690434154313) , XLF(0.005913624198780104648412380295962441022547859029450080837186743792) , XLF(0.002742654132673340374794116647565075451537360564608161247690977095) , XLF(0.001174226813419448281574819394568540738959460782816568717024820381) , XLF(0.00046312112619113668629917242016034809820650809185437324478696911) , XLF(0.0001678692280470183587084393823778678268449966074236864824047694499) , XLF(5.577148447745731442117467121420280155437928063937741918311904044e-05) , XLF(1.693121074353050859812123924410868383063689685851844352658813387e-05) , XLF(4.680419922278284546546993071520900867994108127005781922068112032e-06) , XLF(1.173482971647855029304517285167649445112790451575258071182751503e-06) , XLF(2.656385862845341699661132347944882457731702614376528680249415485e-07) , XLF(5.400886217037518942406260685809766587048927714386097412247519296e-08) , XLF(9.803731306466928145596330261791630057446501589397023858171777293e-09) , XLF(1.577796729651918402165165151322201546453736533155025540870522989e-09) , XLF(2.233189668676868565025027920181951096727102570483283371151014633e-10) , XLF(2.753501820558423316897864045879930790136587770200659995611944297e-11) , XLF(2.924439370867689570743182080918872291143453136750698087111056626e-12) , XLF(2.639716318813207890613855639981798699853131227511779534073488868e-13) , XLF(1.992393452347396835240531704760310555217261328106469607956926397e-14) , XLF(1.232735883198346698394987972242052284937976830986064782145326385e-15) , XLF(6.100222395608102538630270210460291190619230689451770435748415619e-17) , XLF(2.340606208511047622095511314658637754002757907379683435841956653e-18) , XLF(6.691804998538269243932824810018367675456897829986383908524114408e-20) , XLF(1.353310841889026919463010007132143047213413647281774354519328097e-21) , XLF(1.805812375929423132838366222036288874471008936873952889591323503e-23) , XLF(1.445084094963922312730665865086917739272356142336356660823689391e-25) , XLF(6.058181514478084016055824081903371400216174277387329759921315331e-28) , XLF(1.090288468789131969021370473619215204229611387531381810745972042e-30) , XLF(6.181749694528669650796418946570979295899089073003822413734402267e-34) , XLF(6.573564026833981237684803791489952784485482997393552280510535295e-38) , XLF(4.958070483944644661421336981824926933332428939907622926851814145e-43) , XLF(2.989310827142404510789121914487212009086736498727621080187007183e-50) , XLF(1.359381179619233312582365176461831656109432742017643740922270437e-62) , } ; const xlf_type K42[] = { XLF(0.210012983107585934630258469223366980748022502429581356414799665) , XLF(0.2029104822106857881361787030354118332002648070675328481382660588) , XLF(0.1829999098960994422193870701714665144524128518996810367762883662) , XLF(0.1540278165709190636072630550552190300129267127338987194810237309) , XLF(0.1209496876188006587862653921170286385589002154076079736559083534) , XLF(0.08856521158649294181378482138273575895248480572509577889635842081) , XLF(0.0604376924209624819005397073376656852800176925799490390505989997) , XLF(0.03840674361805439458944263308356353160125388228072156702355757431) , XLF(0.02270712449697595837705707464853903818222869160891099713751661843) , XLF(0.01247668044562414489808485849713779561969907446282246686562336124) , XLF(0.006363075336662427925177744576371967119177601840420054571082056064) , XLF(0.00300766931292397090421710019973213467411121870493449163701566788) , XLF(0.00131540404825459606386990594265876767141685860184337532311551894) , XLF(0.000531279537776386625415680800597819291334567065478851244350139293) , XLF(0.0001977319913661670336271780529030911254355756388121587737173796369) , XLF(6.76470072433486136015158878897452715932957741149677527604244805e-05) , XLF(2.121398638459245568592843707146677134737967805662537442078848454e-05) , XLF(6.078814971793625301302657826867675551230080302571369237645911558e-06) , XLF(1.585886331194210690852473025124836960619239156224302059538383753e-06) , XLF(3.751445330251402672333368627288766900794917383419711202854702148e-07) , XLF(8.008717329585359979039491912816788507621482872854795794807711412e-08) , XLF(1.534730902911877315557080857812820890623575280103408411083532996e-08) , XLF(2.623739170026251627740304964033415634941698810541414796834391884e-09) , XLF(3.973005451738919932435225392467236339352961737075350763632659882e-10) , XLF(5.284521200397839850937316961627704199152507433429208544075438806e-11) , XLF(6.114052365100547746402258888217425873614132035514272619545829735e-12) , XLF(6.082169141294098395113102649036362289587930195375634703616457364e-13) , XLF(5.130759073720258206987878788128359023465081359747466415794946471e-14) , XLF(3.609406854202277120101276756708016547951533964350966645272861127e-15) , XLF(2.074622144260672524561989098338589750883318357516219542549427413e-16) , XLF(9.498975346166503173452453446085134085518851488626431153577578976e-18) , XLF(3.355645179099090411569809347194218051610343749816077682421179776e-19) , XLF(8.779201299061389097407613140101757868525143071442353507200548542e-21) , XLF(1.612391792620475595738911087130487029399119807083273792358992977e-22) , XLF(1.935354307515590038123530831294007793597291295959424315504410827e-24) , XLF(1.376371807704408812473948046388662488337874789949181876714511546e-26) , XLF(5.048542733282435223487283510764330822485428091636675669216653849e-29) , XLF(7.787781037602396763468110701855148115880828468541296165854353164e-32) , XLF(3.67961296121988779208313241761704666753744368630307811918922243e-35) , XLF(3.130268584235839099829153849505299897015462306466585894635453108e-39) , XLF(1.770739458551658808993675563846954439813204498137627562650444547e-44) , XLF(7.117406731291439311402671224969552402587467854113383524254779007e-52) , XLF(1.618310928118134895931387114835513876320753264306718739193179091e-64) , } ; const xlf_type K43[] = { XLF(0.2076293306341901087905084403153051316467825932784057050717141067) , XLF(0.2007622583598915126624829341200089921223668132553214532647600241) , XLF(0.1814827885478032561355070472096206090116947753790200083480728615) , XLF(0.1533445537653919545808063175177961988932236866516802008099478561) , XLF(0.1210725909184700436203743445183729957004225486530296065844901562) , XLF(0.08928470634605101518361390641988064514555785275946760332627647403) , XLF(0.06146296768824943626041807984560622581694321780918276685162207798) , XLF(0.03946811591769053148313813928568745700760834764549506161494779685) , XLF(0.02362128085374899697581799340534247104499408876945051964954055185) , XLF(0.0131627017469449677119932334330997743452839198122575324660515188) , XLF(0.006821177651733899480960565112653724691532259788400307230581864508) , XLF(0.003282891962370498895857097649610548103032419579479621437407690603) , XLF(0.001465082234058659534397968967731467324259241328194411651954757821) , XLF(0.0006052121977181860336827468981150929298110514891795079664938572844) , XLF(0.0002309524715512304459460272734365093545633507774771356626814976588) , XLF(8.123084821290463791462722466751374589682668059178490149570383787e-05) , XLF(2.626564071071370580140237593633637369402426155085324687849581567e-05) , XLF(7.785077890544687856535682040258347453221219721611250406896866182e-06) , XLF(2.108234305978987841887627778927324640894309019116951274842940161e-06) , XLF(5.196819790406284192803105294557616369218896873896054981300982793e-07) , XLF(1.161127508193833476227513541220144875803126732496917240691249566e-07) , XLF(2.34017213006551064067698018908143415467924591097883138906977802e-08) , XLF(4.230960867698719546239886933826794834615303594040083319515730399e-09) , XLF(6.818506020327655451473768210347940714791611046717125634657299252e-10) , XLF(9.722984260819391990262563243410764695478702851820548913249344402e-11) , XLF(1.216309296686649826284151896497226392601317593836792445150333638e-11) , XLF(1.321447371202488839303862501623296709784650390593955042359086021e-12) , XLF(1.232087840331201040374356413149777333395818632977616003251436123e-13) , XLF(9.719287572487420410405894507076456860008987322485016363180782707e-15) , XLF(6.376107908523757155021067494440053838083242514410956878291172113e-16) , XLF(3.406166018942180003991164333538677749789025865431798063278040823e-17) , XLF(1.443547902297606342292877509123422964556795989331416064064518231e-18) , XLF(4.696544642491655940013240628947969027157960244550146627589934974e-20) , XLF(1.1246539136203951221444099583154531994637623862285437703662776e-21) , XLF(1.876122514686610505363535775124035100162817848162132737468407321e-23) , XLF(2.025834548418558542667247957630183019037835482770606024804462627e-25) , XLF(1.280418312106776592038681606972726548668868463473364920628995432e-27) , XLF(4.109313625770570392479318989427458457927682229580851189981887357e-30) , XLF(5.433338592304275734266195675603773795202805388144033723387262208e-33) , XLF(2.139309878264048141884109678437244783370283364652484179228976727e-36) , XLF(1.455938876395714257658077149794997910040882961168381198664693588e-40) , XLF(6.176998111226716777158826721443346489513130862817522654446481297e-46) , XLF(1.655210867742195188698295633713849395950573919561251982384832327e-53) , XLF(1.881756893160621972013240831204085902698550307333393882782766385e-66) , } ; const xlf_type K44[] = { XLF(0.2053250369589799561320848189863728328524206044656696681116863883) , XLF(0.1986806684216285746479056311990451568976862928600542205359860491) , XLF(0.1799996720050995103384620805738273637414009800296339991869571622) , XLF(0.15265638048970794149396891499643147013763026024163933484857486) , XLF(0.1211599694924723850205697776454976791676648402037818272294932878) , XLF(0.0899551366039917449754077019660821718052734413247763932495535244) , XLF(0.06244292567384325792955667561301597122800888622203959854644454689) , XLF(0.04049885655168666387089116694557778494754016923591332002429258704) , XLF(0.02452208501752991992740627869914459033615405241969280114152044295) , XLF(0.01384897390827948253524225316676852376093783209402085565721455171) , XLF(0.007286972944016492149907994782807639178996821253413554720104842512) , XLF(0.003567780059843608139533471318445327382620617581642017026696020809) , XLF(0.001623096932138216675945463586701231640901133875068857517325041535) , XLF(0.000684975850452823533700052145758232919783750768135170923841808119) , XLF(0.000267661342884376598649183637465354452636992844539546092608592071) , XLF(9.664218851340608780073819044631208021070430701551437151312134892e-05) , XLF(3.216573124428932339514620590697338438243713795659145929768318336e-05) , XLF(9.842596214439556916321584642169461263274164575103885132251649189e-06) , XLF(2.760648496866660937967306573212367776852863440784646674641040665e-06) , XLF(7.073330455442524788288001013172655439590005180116023123447777432e-07) , XLF(1.649222881881624136187250040829201491316903639417770210925887806e-07) , XLF(3.484033451060586173861210268819222910241038858180962359939814961e-08) , XLF(6.635495230127985669097496482932512026878754017478817741343782382e-09) , XLF(1.132872368031011551939273358519418254027979511455906889663132079e-09) , XLF(1.7225131883411210032080656875356547399951468565742443886288703e-10) , XLF(2.314884281986423428186117072495208759978029656483734813198869207e-11) , XLF(2.725529432236534027052804441679959473424552899126495888655097588e-12) , XLF(2.782481975779987883752130617469187007959585695217058175069609628e-13) , XLF(2.433062404880983177122795108027311204307840166383121807657917007e-14) , XLF(1.795785954702453861016451974091385582897587230683861277019525835e-15) , XLF(1.099139540392949569114041710777765622666661758047305854800181482e-16) , XLF(5.459461552047714307649693972425174334444359897117157276089705294e-18) , XLF(2.142354668320187826615059907431225958590574723300909852631470497e-19) , XLF(6.421008325686980329497414015663102620875934400768000668704744939e-21) , XLF(1.407636038949642381372225090947189049994521039639530597911932824e-22) , XLF(2.133121419700336280269507636829106001592901071514672078869216497e-24) , XLF(2.072254486341352391593538621256032205987663723584251671898187384e-26) , XLF(1.16406727620667973714996166861672862486388117475326672452241832e-28) , XLF(3.268792379899305529926251057624490851910317652339230055234100853e-31) , XLF(3.704550429967297287454197543348943301045372349122849563037004777e-34) , XLF(1.215516982252809619869050113413210346490893407286821957965207522e-37) , XLF(6.617903983633058913744194861139542025697126078953067875417046725e-42) , XLF(2.105794810645471628762910075156793248203512739954559397578088938e-47) , XLF(3.761842881232261792496126440258748627160395271730118141783709834e-55) , XLF(2.138360105864343150015046399095552162157443531060674866798598165e-68) , } ; const xlf_type K45[] = { XLF(0.2030957943865536540845257563368017159398570993680554254367857391) , XLF(0.1966623471932641667983503849039629286022752584705325891868097122) , XLF(0.1785495074895293164024261743620744163344938819715674864034997716) , XLF(0.1519647155705602867840198729922662546677313256806619055476036564) , XLF(0.1212147201196368976835231058683803370671571444043074636235313371) , XLF(0.0905794217617485993361194593303246597611412405075452715808655188) , XLF(0.06337940005385973689225165288681479654981140503620609301383728336) , XLF(0.04149938452454414239339159022180317961898345934336958670902134228) , XLF(0.02540890155567983648885495572152729012330188182268702629347449) , XLF(0.01453445175077237895065524472136411108091694564962643502046691919) , XLF(0.007759541600547815856473283420406724320227961555074486190607789186) , XLF(0.003861784201916173064174124909335973053291704163871103263235520616) , XLF(0.001789256904010516018874900547567984892850650470307427694338773093) , XLF(0.0007706024221193263291833570330177395206146666189456168615070366984) , XLF(0.0003079750947254968705160662232338019805080253666806259094858348971) , XLF(0.0001139956803924147031692067924639040497675049288453111987558336487) , XLF(3.89945180077492795870453773467096556071569712680208002621471413e-05) , XLF(1.229696734874433352834644761571785907813785997233983400150219866e-05) , XLF(3.565133432434886136689417516278332335389473791729459241034577254e-06) , XLF(9.473050905292266912259378082458193296509687395266774065147374299e-07) , XLF(2.298902362427839036033022379074399029299233234966788364280499911e-07) , XLF(5.075150427914039311441841518254980037881261797461673684427503261e-08) , XLF(1.014670388089610519176178947662148821924168814158258483202942858e-08) , XLF(1.827797014361082281237534137238264381379753952359723270360513332e-09) , XLF(2.949290121563638949154921759797094191177796667952062882425346021e-10) , XLF(4.234211428005720025175207843537946751781057394048771582461266176e-11) , XLF(5.366785851365562204447097277164665538365721647803442016669722563e-12) , XLF(5.951243863117473803700324980086574130467914267207011200448570399e-13) , XLF(5.712617463391066418298947286141443497179639673845900499583046935e-14) , XLF(4.687379504252380687547005346616401632248142127449838902364969469e-15) , XLF(3.238610483688320895960512036374053619531018560346964931699470108e-16) , XLF(1.85022826837276644437333169717096541542348895826708420896170505e-17) , XLF(8.548144486452398308088501545241573631849929049211122879712760469e-19) , XLF(3.106868215026607054902687175421196155180782062782594555716017684e-20) , XLF(8.580305513535532317376725422100395547848738321659686865640719621e-22) , XLF(1.722309076642825683274042629250139267867581279132874335063737376e-23) , XLF(2.371195505407584175171078450441524132294766060086633063981472089e-25) , XLF(2.072555941638609771992409597742457575699562119545522956447725639e-27) , XLF(1.034761043543283905574648833503052256683148578811495568393626852e-29) , XLF(2.542405553137625577589049585847897342916852810257136640722792557e-32) , XLF(2.469700919022356627683983453344582265142041799362836141984489058e-35) , XLF(6.75287214398273661328983748589263265146507588422741180825633819e-39) , XLF(2.941290659396148874703513705370237112829197922636809220006601327e-43) , XLF(7.019316035484905429421160305435895872179863721536869261295865877e-49) , XLF(8.359650847182803983324725422797219171467545048289151426186021854e-57) , XLF(2.375955673182603500016718221217280180174937256734083185331775739e-70) , } ; const xlf_type Poles_2[] = { XLF(-0.171572875253809902396622551580603842860656249246103853646640524) , } ; const xlf_type Poles_3[] = { XLF(-0.2679491924311227064725536584941276330571947461896193719441930205) , } ; const xlf_type Poles_4[] = { XLF(-0.3613412259002201770922128413256752554300293066372075233002101795) , XLF(-0.01372542929733912136033122693912820409948234934137640212780201923) , } ; const xlf_type Poles_5[] = { XLF(-0.4305753470999737918514347834935201100399844094093466703345548961) , XLF(-0.04309628820326465382271237682255018245930966932524242612443881562) , } ; const xlf_type Poles_6[] = { XLF(-0.4882945893030447551301180388837890621122791612393776083939970451) , XLF(-0.08167927107623751259793776573705908065337961039814817852536794476) , XLF(-0.001414151808325817751087243976558592527864169055346698516527091711) , } ; const xlf_type Poles_7[] = { XLF(-0.5352804307964381655424037816816460718339231523426924148811983969) , XLF(-0.122554615192326690515272264359357343605486549427295558490763347) , XLF(-0.009148694809608276928593021651647853415692563954599448264800304012) , } ; const xlf_type Poles_8[] = { XLF(-0.5746869092487654305301393041287454242906615780412521120018827786) , XLF(-0.1630352692972809352405518968607370522347681455082985484897311685) , XLF(-0.02363229469484485002340391929636132061266592085462943706514568421) , XLF(-0.0001538213106416909117393525301840216076296405407004300196299400714) , } ; const xlf_type Poles_9[] = { XLF(-0.6079973891686257790077208239542897694396347185399082955021963142) , XLF(-0.2017505201931532387960646850559704346808988657574706493188670608) , XLF(-0.04322260854048175213332114297942968826585238023149706938143476332) , XLF(-0.002121306903180818420304896557848623422054856098862394663415165309) , } ; const xlf_type Poles_10[] = { XLF(-0.6365506639694238587579920549134977331378795901012886043233946858) , XLF(-0.2381827983775732848874561622001619786665434940597287872519238159) , XLF(-0.06572703322830855153820180394968425220512169439225586310303406759) , XLF(-0.007528194675548690643769834031814883165056756744131408609363634153) , XLF(-1.698276282327466423072746793996887861144001323413620950069296025e-05) , } ; const xlf_type Poles_11[] = { XLF(-0.6612660689007347069101312629224816696181628671680088080242089952) , XLF(-0.2721803492947858856862952802582877681512352595653351762441917173) , XLF(-0.08975959979371330994414267655614154254756196601701854440621445969) , XLF(-0.01666962736623465609658583608981508371547272055193351560536097831) , XLF(-0.0005105575344465020571359195284074939241798925253401410628961016927) , } ; const xlf_type Poles_12[] = { XLF(-0.6828648841977233230185698902607230174299001013684859476841835817) , XLF(-0.3037807932882541599892583738286603768367541154892904874450630646) , XLF(-0.1143505200271358850105800857719841802152524363943309096223167966) , XLF(-0.02883619019866380371021569628068064589471421148796348420674179784) , XLF(-0.002516166217261335592723139872534647136514907382952945705857963505) , XLF(-1.883305645063902643439348865811401007447714099792616803257191584e-06) , } ; const xlf_type Poles_13[] = { XLF(-0.7018942518168078624531414012287862764084947122814778574940273876) , XLF(-0.3331072329306235924822781708234713524225941280939732447311497427) , XLF(-0.1389011131943194302117942814286554135390464688327444862031836343) , XLF(-0.04321386674036366964776175493557673701243101733056994259431228149) , XLF(-0.006738031415244913998475856269761420409518027617620529866093864119) , XLF(-0.0001251001132144187159609064792562718948551304769602578038425767285) , } ; const xlf_type Poles_14[] = { XLF(-0.7187837872399444505020456522129282595964750174549937757131229127) , XLF(-0.3603190719169610581080084136755666203625002934242563261809150911) , XLF(-0.1630335147992986909768156526491246798608320766128086498946800539) , XLF(-0.05908948219483101879561625313720803556201246447376003919538272219) , XLF(-0.01324675673484791465503043810667202647754110845913635657121884712) , XLF(-0.0008640240409533379359827592769202302559672760211495479400001004579) , XLF(-2.091309677527532850132717832806851684714105981958539758162531259e-07) , } ; const xlf_type Poles_15[] = { XLF(-0.7338725716848373495693848190690037113259546621452360440203438709) , XLF(-0.3855857342784352080140541786889509307194303153524667924992133235) , XLF(-0.1865201084509643457413145787278752891979317998887406861617926908) , XLF(-0.07590759204766819930382221724156900984700061847619931049627371902) , XLF(-0.02175206579654047146207140882909654431610437531706224317179848268) , XLF(-0.002801151482076454872535129646713807156117939572215494561939408094) , XLF(-3.09356804514744232400754981719555295636107606371219648458073457e-05) , } ; const xlf_type Poles_16[] = { XLF(-0.7474323877664685085574427283736589844157583318318004439330736865) , XLF(-0.4090736047572509102690812680965741156739840772723126299135932808) , XLF(-0.2092287193395396948898543031587591726843472742877748490017783853) , XLF(-0.09325471898024062584258002426076364675726611549020054157303989561) , XLF(-0.03186770612045390040617688956292344467322144721011000028899027708) , XLF(-0.006258406785125984912475394080646091554877289512409984227223665675) , XLF(-0.0003015653633069595807102978730177732059471914699696332281898335403) , XLF(-2.323248636421231704917740718046885239052877562647066064799116983e-08) , } ; const xlf_type Poles_17[] = { XLF(-0.7596832240719327792045776593358745445069777141400539152276006722) , XLF(-0.4309396531803965785693965637300692642342545429704656904137548696) , XLF(-0.2310898435992718341019926513706630467054625898374489974155007084) , XLF(-0.1108289933162472532114988831064802678108300620508942380785689952) , XLF(-0.04321391145668415769660776100790405761679462981400693616624507606) , XLF(-0.0112581836894716022181896293491206662733281119456807152725245783) , XLF(-0.001185933125152176738908831907655560107159111098519587692500186341) , XLF(-7.687562581254682728955673310667501934979112121779492801030611799e-06) , } ; const xlf_type Poles_18[] = { XLF(-0.7708050514072084875653763894746947171002896449375376904119489378) , XLF(-0.4513287333778174878316722609027777273866881227917307297897186109) , XLF(-0.252074574682638763584472896601959892330780571622277331198978582) , XLF(-0.1284128367967145725789350332258624076307639057064238630350586344) , XLF(-0.05546296713852201088203462904233467368282674158008500152656817855) , XLF(-0.01766237768478522019536068887024688998012080586932492506861471418) , XLF(-0.003011930728994830197965802954651925043296476246417439420870245334) , XLF(-0.0001063373558871366640189637199846302574071259717460536890338053772) , XLF(-2.581240396257155753009061408465308128783279309333058163208326594e-09) , } ; const xlf_type Poles_19[] = { XLF(-0.7809464448517323032921331480485990736133673529544175911917193925) , XLF(-0.4703728194676430247813348528935411602052750451796672897113815149) , XLF(-0.2721803762830344869914311059697431889279401637302705823402891798) , XLF(-0.1458508937575645126564144435615800230623243319927531764237235694) , XLF(-0.06834590612488046381791503567280010798024149861224965610323129877) , XLF(-0.02526507334485559573805086677780212256467747396675298316895380627) , XLF(-0.005936659591082969962210417817284936502087769359983070616251584598) , XLF(-0.0005084101946808165632250947968083231925294339604915334694819054562) , XLF(-1.915478656212247986385955601447744124330070942089945639492293327e-06) , } ; const xlf_type Poles_20[] = { XLF(-0.7902311174807248549332936375457893542860119537154342460770408151) , XLF(-0.488191260639127932238629607901884051326457227480964476948377784) , XLF(-0.2914216014747825537725469531269362807164011247774040738675675161) , XLF(-0.1630335348502630805820942942606579464754200839866649328459354433) , XLF(-0.08164811561958893557999175584335413995558571097225276455879102418) , XLF(-0.03384947955391195844315351136577024243837182069205517957617886762) , XLF(-0.009973029020058727414581506443127589654186608606974154953533863969) , XLF(-0.001468321757104340435450206390769993910611464623851653381355200669) , XLF(-3.774657319751902549594020029881206053580025869672360111786256136e-05) , XLF(-2.867994488150105473662242044890074684469234296770066865401193409e-10) , } ; const xlf_type Poles_21[] = { XLF(-0.7987628856657315757203801383612220144086561675303372460663002769) , XLF(-0.5048915374483703361482062601022197153890772443749564975705740451) , XLF(-0.3098231964157903198162558904555100588543835587935743155364140584) , XLF(-0.1798846667985358388109211672203553781384692593141807071537453683) , XLF(-0.09520081246073120058708495270851148142240478223201471931334499159) , XLF(-0.04321391844071573523188109232276280721250761578908178194021699031) , XLF(-0.01504549998729455474001276889463311669647492626006472344511038256) , XLF(-0.00317200396388417259157655496824079224784953069498238455598383488) , XLF(-0.0002199029576316323303215070397334940614688205773541255587462877769) , XLF(-4.779764689425363162589596178024910556810143850138812709242039102e-07) , } ; const xlf_type Poles_22[] = { XLF(-0.8066294994690704381784577435835683355058440289815194105900519751) , XLF(-0.5205702359699174392369736031713993379968280185938642109865886427) , XLF(-0.3274164741105744655612120945481846353114397459808033338881750675) , XLF(-0.1963528261275153599272899298935673605843126849556521483006046854) , XLF(-0.1088724519840775961967031236282505637337642412423072789898409219) , XLF(-0.05318160458588924115751824694892656984029762515468591039270267612) , XLF(-0.02103566093067810131733673269896154114203893543044501644112959873) , XLF(-0.005706613645975125961234826367939161186678825260186810986162279501) , XLF(-0.0007225479650742529170027747868538226466859987928309007310242102331) , XLF(-1.345815498330488082616558794512746781444517394742113720906213664e-05) , XLF(-3.186643260385932993403597953003403529026022997206807142846413846e-11) , } ; const xlf_type Poles_23[] = { XLF(-0.8139056235618003511160832839269542710216365904829182798424493152) , XLF(-0.5353140836684856561212427588911268204417052058992365104857921705) , XLF(-0.3442362769193780815082593918406731698782468720833165382304015715) , XLF(-0.2124046605401930352247334461921198933385766551087644460716790274) , XLF(-0.1225611609919387285989759258209244090173671464974510953899350814) , XLF(-0.06360248015411367696237081559934765507739077553650250339847155377) , XLF(-0.02781166203794071880492535613955159051479715041944045901584377782) , XLF(-0.009079595335295149302724717051209901706254818465340811597141835918) , XLF(-0.001711271446781703660793369164854098249027950784315701693774921441) , XLF(-9.57339435007398175334370890753995602326869462772976757522717198e-05) , XLF(-1.193691881606649475531914423897375002871006593953056103637512138e-07) , } ; const xlf_type Poles_24[] = { XLF(-0.8206551810729532156848190473086285634380511372977395325255480247) , XLF(-0.5492009639573297506029461059136924670588956118496358242197130164) , XLF(-0.3603190713767427976422377445566057753967064903947126273791578148) , XLF(-0.2280201472279068744951961560602261536599135945349852345747884729) , XLF(-0.1361885000680127106266852561836016842025719709716392129723196932) , XLF(-0.07435149727736939900226484262517080976959261781922548817880852866) , XLF(-0.03524412666583196627738452246600390373343076990931685283556371198) , XLF(-0.01324637532530081046738011746362276892101808265885329670404363107) , XLF(-0.003297682623486454663018723727298904531875022240846464545776918981) , XLF(-0.0003580715441086799517180636275229816571893919159569076806601582924) , XLF(-4.812675563310482672752116792028140409899187591565777090081223269e-06) , XLF(-3.540708807231918732476161328832045170720301334926365154550694474e-12) , } ; const xlf_type Poles_25[] = { XLF(-0.8269332126349767039536046203302337317844718793583579210028891483) , XLF(-0.562300867680010075615339941705359654958070477846209224585448454) , XLF(-0.3757016770096310556705494107763768501576330898118634499426110874) , XLF(-0.2431890684152279829614631308632634458341487203556387167850475983) , XLF(-0.149694464390990002697418020451765628593153014908409895366739785) , XLF(-0.08532560756216579704117185688473574146181888947184193945606167234) , XLF(-0.0432139182679600323290365545980944075204028971941545119308763697) , XLF(-0.01813436080558409621046323364959820919088282979160784076616637111) , XLF(-0.005534084441487329840365165647931103570866714934686263509692316399) , XLF(-0.0009299873306092404593718224645733852844290861912017824499403961315) , XLF(-4.187875052377163435464804985213326332107893175912333792888710719e-05) , XLF(-2.982478285796372981678270626577953206842887466839166029601480287e-08) , } ; const xlf_type Poles_26[] = { XLF(-0.8327873634529700580425956057636158354428269078244656773169746416) , XLF(-0.5746767631908141696308575162797660334002247637147295013301378323) , XLF(-0.3904204283620341355956031741141048059108648997510733921739160726) , XLF(-0.2579084013447033805967347937013898442368925540218292437269621744) , XLF(-0.1630335348216207883671433265453213548141587465724585702337761475) , XLF(-0.09644049675858785528252082983409831400920777167932652742564346496) , XLF(-0.05161516247228332846551007467288419671247505561613920738425427913) , XLF(-0.02365917403431692583041444393488441803059901802519424862659885329) , XLF(-0.008424934725813554238984216777947408817561185396254943927068045688) , XLF(-0.001920011836343276892788589768295800806467265571069546675768503616) , XLF(-0.0001784141661193633141006905730926794065324105614061195213247225048) , XLF(-1.72453039221046446691317823363963175837149616659442073510620035e-06) , XLF(-3.934118864515371781248316568521890530388409112936934377274810852e-13) , } ; const xlf_type Poles_27[] = { XLF(-0.8382590821014597701203030396072828509655577771509714233091817271) , XLF(-0.5863853797684231139786093581652022714013749014396205771002823535) , XLF(-0.4045106347246844323157228276106909233093664941429080876157738678) , XLF(-0.272180376229601012226312104374117418470314796210649818535048906) , XLF(-0.176171577593237769260878301446307942921591427559888820575898633) , XLF(-0.1076275243906049218438449209355682072315081677169028900962096741) , XLF(-0.06035576309668649760051296356543150268415454102548685960527181155) , XLF(-0.02973460374678660315296461597390414378869303198628675252413705228) , XLF(-0.01194239633191651155429095081546769400433414825855333988737411513) , XLF(-0.003398965898172871421994175583610690257655645339749817682769112609) , XLF(-0.0005082738135987045967033201076290063904583491673040769265276190783) , XLF(-1.83863717083283226335666550100440075767671437179135903626413165e-05) , XLF(-7.453737087428511227281352008850694658452951597264931151361779818e-09) , } ; const xlf_type Poles_28[] = { XLF(-0.8433845940335697327620979814421339557729383849789504154891521657) , XLF(-0.5974779061705426952715719545925882255856081976422230541781315342) , XLF(-0.4180062448978233890294706967407767049597049821465408946420339303) , XLF(-0.2860110251917676037727830882739365545362954330924689512958222167) , XLF(-0.1890834149624572449013119889044253081757892779575779157818425185) , XLF(-0.1188310346590514029276147642028441774360383590515184634504142881) , XLF(-0.06935667734278621035033686384669203272534727947051358856607541783) , XLF(-0.03627817198644646968927939247374571890234386194695592481681508351) , XLF(-0.01603960247911933790027956028994526888519143240744611981000577388) , XLF(-0.005399939151044706317754683424710989393682567643539954358821435295) , XLF(-0.001124473968853076633238431246782069906092193168667140497266094101) , XLF(-8.92772318754021921924487165064105370760868416010510825241999918e-05) , XLF(-6.188192321062351389282815986762183605802895405571164085405977482e-07) , XLF(-4.371242485810543843473735883734189077880275959202918768787132822e-14) , } ; const xlf_type Poles_29[] = { XLF(-0.8481956975631381369279712504506719747780655943730513988835529113) , XLF(-0.6080006097327870264529380052457492228776122471012456793760593247) , XLF(-0.4309396531921757538439504082304478699802287742339303845026397798) , XLF(-0.299409094278988118709694298044146784157754689464631775211890117) , XLF(-0.2017509205638139981207049644225579330514883488697396176604902006) , XLF(-0.1300060717335343394702785641868817666903514694464950426411133639) , XLF(-0.07855065547112690760891665743680039037354134185450452866157544345) , XLF(-0.04321391826348333423277774691288523219177010041751862435576842583) , XLF(-0.02066024751435230681809652075146407276551297215303958926398979035) , XLF(-0.00792582821369559765627187731651237889770310786172227205028825455) , XLF(-0.002100199856436503618329184465359189752696089460776825340850275862) , XLF(-0.0002790426068554639276105549448921194704648004130002938449591201897) , XLF(-8.094582441984409990310718354798851107543606137925062869408782119e-06) , XLF(-1.863088870511326052369351682118102095185217108464409117916298728e-09) , } ; const xlf_type Poles_30[] = { XLF(-0.8527204188873781146594681300124141875559652045399938128715825351) , XLF(-0.6179953830117889231405877079394802978883853957072802322233160192) , XLF(-0.4433416024105959222699191024048622397707713367332537381488443229) , XLF(-0.312385225626562158841058646135479643676536592509814188657404174) , XLF(-0.2141615233206583148231085471027137107279207520033109007938207413) , XLF(-0.1411164738701920750459492747398355136725916523234446788894915012) , XLF(-0.08788081801854613176014562004502497336032216679963396567234973739) , XLF(-0.05047351681146979841967981172475549486973658960072458590729594154) , XLF(-0.02574493385099440911538076059724497153820987481170045943673148698) , XLF(-0.01095839629287172842698413310327426362928130402503158742791216183) , XLF(-0.003482288669069040241834549094690772217582796376804567159145023786) , XLF(-0.0006616482677279396550449026566993868687720784853285375490860328414) , XLF(-4.482596596998705327314716498792059075387831881897735585689104483e-05) , XLF(-2.22269051335366943425960529138102939881992253252418536823699629e-07) , XLF(-4.85693585632446413022984934041620945754435531034158717983414012e-15) , } ; const xlf_type Poles_31[] = { XLF(-0.8569835543745790006692660176613903978235236113685321985797685177) , XLF(-0.6275002254374090295625305245199459527611076546929184941211088856) , XLF(-0.4552411529474754870125291859636648372320011901586959086311909406) , XLF(-0.3249513417693065471598202874462884925973577281876529518062349127) , XLF(-0.2263070292125006212432694033802036554671095695492104461360681813) , XLF(-0.1521332995713027103088557046255432838293121337329509714701607771) , XLF(-0.09729925878028641707609812795297238388135853571553997308037946095) , XLF(-0.05799644242639943959494223261278172620094142936286773950122429818) , XLF(-0.03123508345332988342320715140725308412210179149409314511539688198) , XLF(-0.01446609949032905075952120070199286833657619175444924733096165779) , XLF(-0.005292679268065068164838563470991247284500582212313629199311876432) , XLF(-0.001303973183898104304083796237137082634737593376712824510513537086) , XLF(-0.0001537511493585799622361256146865309659915332469334283743248429426) , XLF(-3.5711513081322671069286563588876077644083947779678621581936046e-06) , XLF(-4.657236728778484597058209578183411059230343300563713079312932431e-10) , } ; const xlf_type Poles_32[] = { XLF(-0.8610071220775073984545025884174647651873342531478458248801207525) , XLF(-0.6365496672844416072579682630981124647076099496259688957518003072) , XLF(-0.4666656963925545842292615356620524236211323834923337799059180971) , XLF(-0.3371201820284790043011923387339551881746442456200868620563921785) , XLF(-0.2381826906635858373340295751961314788024671940456938940165961336) , XLF(-0.1630335348215802295105412986273882397126963886458718842033727226) , XLF(-0.1067657601912722166551080865931935785675861772344540943455026145) , XLF(-0.06572962575614472922119768199720706395464914974888004512385744442) , XLF(-0.03707517947165689966943488764775622423088672123093473823346986939) , XLF(-0.01841002303519408058508537427084853973752496338830604506274024818) , XLF(-0.007533364010056747507177074686484737587952268224142799783978323192) , XLF(-0.002256727280342519009243907948704147629674856107478718283783368275) , XLF(-0.0003907975105276824894206940606710036419613843135574836014341309415) , XLF(-2.25689522301406632236836294375985204961356448661504774692761827e-05) , XLF(-7.988927839078504108724438276285537087646234418866499870958201165e-08) , XLF(-5.396595313947431360582308041198731074375556607940834970394538893e-16) , } ; const xlf_type Poles_33[] = { XLF(-0.8648107396863688016557219577627117613740501334459417369612973419) , XLF(-0.6451751428015154579730662039203667631320950694797143070152669499) , XLF(-0.4776409984675163764629779248436891401227532476094464300149541618) , XLF(-0.3489049538475770937469889124693515126882768453750585940131312858) , XLF(-0.2497864693298141795853141466596027633821329905647261273822420384) , XLF(-0.1737990337412069900543274825289003508247496408633109766001367353) , XLF(-0.1162466535813227634735432582332078210348909125051792336777833672) , XLF(-0.07362686063059345521691441309339694564214463979560969267774880714) , XLF(-0.04321391826377283621523198782502830748757882697443220936696418131) , XLF(-0.02274804078697487978955811906660017817649880159892115509883731691) , XLF(-0.01019192053500615854898240748305761043473774820921200129062596151) , XLF(-0.003552018463170819032019002147056237136233449836891093400369306572) , XLF(-0.0008127973866455559381973382214548059560731448019530608139739299597) , XLF(-8.49681467828939693602893765540687251950218465964645214688094231e-05) , XLF(-1.578071702213531896356749679837082743345320035321838037161612955e-06) , XLF(-1.164240937862773374270370353664588526443504516547867172644971029e-10) , } ; const xlf_type Poles_34[] = { XLF(-0.868411942510091296418192858342830867706147953079911531537804392) , XLF(-0.6534053187195726880325209718789722564773514329072110885304557506) , XLF(-0.4881912606388469230758719263538286201213595461658537171317618765) , XLF(-0.3603190713769044598971420551578760292660424066730323100982338825) , XLF(-0.2611184504267236207918374987104792017107442476619846966422450009) , XLF(-0.1844156510074104335265741903507421423784375784641641435814659504) , XLF(-0.125713828924540961029769529606132339915978500886858165075893666) , XLF(-0.08164811550866748008243902441268118591133748932536970143016310113) , XLF(-0.0496046783509799933361487402166251061242552782544855823840231184) , XLF(-0.02743756763176555494239788327004066705311239344350355316655813824) , XLF(-0.01324637525145965067990480147017405091904582300386826800688010616) , XLF(-0.005204974773903469112731368122962788298472575120242691293903018908) , XLF(-0.00146839633965869992758814019288581419517739426256657781805763164) , XLF(-0.0002315423225987300653263686287845986665627360964885761526307610157) , XLF(-1.138845488453173720559090869349695013665749123053966886063934007e-05) , XLF(-2.872789784985329839346948561380530010617370484875594835519237793e-08) , XLF(-5.996216987387188258386427309101063013984749098381037237664635103e-17) , } ; const xlf_type Poles_35[] = { XLF(-0.8718264522846033999342394061195880256470727705830912665217989902) , XLF(-0.661266383704797300961590243559889585396787970468597759497540677) , XLF(-0.4983391929337384452379841548311697188141004659525090030081443309) , XLF(-0.3713759605149176482720350542079580293398141128093010991800163526) , XLF(-0.2721803762296094393575708923396865462233310560831404983487216165) , XLF(-0.1948725309279406466291289892705231229598703096700427955532161037) , XLF(-0.1351438852882478430178858011566228749007004654435698322990999038) , XLF(-0.08975883392391288140125772247222514252144984921722775882971425236) , XLF(-0.05620557742615392424089724950105651030992598653610383642011580072) , XLF(-0.03243728423609834830880510161503194535520176987756301783559450961) , XLF(-0.0166690219153114058357659185710515802172279551695696752297456289) , XLF(-0.00721702343640738546865426044583219398433188989439217014848981244) , XLF(-0.002393632153391934976738453336174253298399748468472159832875633091) , XLF(-0.0005082814595413963602179947181220570751322338012057876016538467729) , XLF(-4.707210567991254277165743888852309796308110097300718701320097425e-05) , XLF(-6.982167361691684787303327536741801670656721991854478901847667883e-07) , XLF(-2.910506393324535687374866149601235708803686266342619866044464595e-11) , } ; const xlf_type Poles_36[] = { XLF(-0.8750684054438948847452437373406664292611236884066242889288642594) , XLF(-0.6687823036766544255379859722251323680047749665524258229415040719) , XLF(-0.5081060927404221239947201115147411334358304319098157872485907861) , XLF(-0.3820889147120881724540180154136545429612621860660454125962716) , XLF(-0.2829752736425094565610373986802070821391575930451883961728558898) , XLF(-0.2051615241418207572727997969528817223652452129707106074838495989) , XLF(-0.14451740754635553297283498330242538411493124191880429538112964) , XLF(-0.0979292687537774729071245716398629162926615109445190283416048203) , XLF(-0.06297929294836479990958242477558204390841384178857692323889463793) , XLF(-0.03770814705533074950721322074860584689824799331481849682475799349) , XLF(-0.02042922248740917225652419002908239423559601850491465766821897678) , XLF(-0.009579426581121988184135814928911638479896594280206909496592497805) , XLF(-0.003611220147146754138028542854031671084901914644652113492134872149) , XLF(-0.0009586466637195032820566510915883769629669327808512967663043218798) , XLF(-0.0001375423203276128227751325930784359970998592216808835620834600289) , XLF(-5.757252845898398275684359808988308159656730828735893183687005237e-06) , XLF(-1.03338827926791787654999609776125053513014082753969839763193209e-08) , XLF(-6.66246330967919499434053724033473943586232453420054571477522732e-18) , } ; const xlf_type Poles_37[] = { XLF(-0.8781505478013884659147982039146191429554172009001990824695919527) , XLF(-0.6759750473095329121930507307346648240013232615900152212905613459) , XLF(-0.51751192597684673535943933018009947621992684254357013566134864) , XLF(-0.3924709896295193646603292631813403487513963808345868441557056082) , XLF(-0.2935071562988579964793774104326865572720481250740885379204084173) , XLF(-0.2152767082291064937186817220600191746368374476873378239389736579) , XLF(-0.1538183533518710859528029818294132876109657645325606260063974945) , XLF(-0.1061338716626476411426469605322070807913023243648144140725098819) , XLF(-0.06989275733426103265981254390513881080482550264003339404594638648) , XLF(-0.04321391826377260296705031812585452479799546997916006826834407192) , XLF(-0.0244953631883323823647994032926557267827645892519567084880868072) , XLF(-0.01227642795169495998563028454995526435467331188779107959837355095) , XLF(-0.005131933739471125785686386431120910226126867368175623514253818484) , XLF(-0.00161854045004396671227841390207027781540938842753041362501455449) , XLF(-0.0003187154552604863395905178506217389398462053401287750680304108978) , XLF(-2.613169289577131021924859793304113754318711850182299727934643579e-05) , XLF(-3.092266885451066382879815399764181243826377944052970487313410121e-07) , XLF(-7.276131065451551553705333358570156901754325759020523687046512448e-12) , } ; const xlf_type Poles_38[] = { XLF(-0.8810844012627599734510052333679306661258475871582915975183354138) , XLF(-0.6828647854893554114611749196595961022046669812516270865341674094) , XLF(-0.5265754081532769568125705273304797865876876752529184222498852699) , XLF(-0.4025349275748515179303532349494372237459313802116942382392916943) , XLF(-0.3037807859294119250547972933652790371407468852466401107412552645) , XLF(-0.2252139929681739289389520458526953685645416021161753009872430301) , XLF(-0.1630335348215804651084659583007473631897139758517007075874474948) , XLF(-0.1143507456252265693234335556374301447233223914711100984308679374) , XLF(-0.07691679639577583589865395330466435228808416915862999472469522428) , XLF(-0.04892138275231939194828291450254472790783561299918000372996496771) , XLF(-0.02883616221709854417423820419081887870825812026619930751611767068) , XLF(-0.01528779400217048844947648824603569566787484444193002536183233717) , XLF(-0.006956830335738254573880471095541693737462538024945412076596100204) , XLF(-0.002514189750356696521640250002282235456229817645538615794802163848) , XLF(-0.0006276117067541021362760836858935686241283161765630151995535965938) , XLF(-8.188256383484340050684230163387384565153875154394489527208557087e-05) , XLF(-2.9148971463249452196845816069833301908545518066006530514321514e-06) , XLF(-3.718130253557185189659320961860860784155151645390455257007783541e-09) , XLF(-7.402737007448795866796676541208151168640091132168127867274461559e-19) , } ; const xlf_type Poles_39[] = { XLF(-0.8838804071427821403538573400038413542684225267333774944945296725) , XLF(-0.6894700680083400659184881413107846576898799429348201246959462546) , XLF(-0.5353140836684927401585843610161871270364923529126416793493216935) , XLF(-0.4122931047661460010046659312530330357059065205259364136473437579) , XLF(-0.3138014810230074140709270572250097647882280987948234857948216725) , XLF(-0.2349707946459764815730416069548419384943178603174648217070090128) , XLF(-0.1721521807622996270283830813993210037412921179142107951912987013) , XLF(-0.122561160993944562864220405676381709565252994327507482588781223) , XLF(-0.08402575235660301965392963042072378981949820562143619688073452448) , XLF(-0.05480036756601220303752327387292872471060304044883900751548376047) , XLF(-0.03342150293797134126219152488193779932419906557898733772011352821) , XLF(-0.01859074722483924127966095754884896879187571795331881586252931603) , XLF(-0.009079595195884276359663059407364285645767778417547513531325443767) , XLF(-0.003662104378870138633637104472977441640395396108077576162517066551) , XLF(-0.001097586856737514722051463876016155621381043342811812777686957317) , XLF(-0.0002003068879448743003281295502194398955674433171563053669149427904) , XLF(-1.45321316247905172365968562238955363512217736902440957179898996e-05) , XLF(-1.370545029060038177725089857260695152348902198174582425063614265e-07) , XLF(-1.819013794625787766003677964763074328080221008080626474426009181e-12) , } ; const xlf_type Poles_40[] = { XLF(-0.8865480498249417239427008642652960486190151746361567239906664105) , XLF(-0.6958079803507162016170415352239725903926397620664406565492174346) , XLF(-0.5437444022594512304292991538755948156506013584800660408864804425) , XLF(-0.4217574960836135760735709524382669355715274973575640737089458263) , XLF(-0.3235749633481036665976933546361683677573000672674848853480301136) , XLF(-0.2445457668079241226571226562971627843396378500156285307769417512) , XLF(-0.1811655669715627651723204505658858064383274888853659588702144339) , XLF(-0.1307491316833214492154291590394745734318271118026811489274112894) , XLF(-0.09119711540837702671005955403236223294722560143602328150342637692) , XLF(-0.06082364166735568921630568955938537615700163020351593484887383573) , XLF(-0.03822293082901784992510790132362458922098823926664849012758909777) , XLF(-0.02216137502477028067168835200751675594489780033002192164202868613) , XLF(-0.0114886482426543975194893315944088434725923778733718568173041969) , XLF(-0.005070168042357454064418051147786571569953137135000204986554756488) , XLF(-0.001755563044393195285060352517031713953284630161352714167548947187) , XLF(-0.0004118650546227838083044596471981422158427970594272683285924695645) , XLF(-4.883749869313317942908248899528742439137511496965237578555703579e-05) , XLF(-1.477664068328894179909487136579716007242828522498197961890553758e-06) , XLF(-1.33800265813914219921316176928755813141341326695004689445150919e-09) , XLF(-8.22526334047607454535486271035257115822466847965542839458599925e-20) , } ; const xlf_type Poles_41[] = { XLF(-0.8890959638357837873469022953836399246288449125671096352859832732) , XLF(-0.701894283045822693090333392530384140994025066196092798504051686) , XLF(-0.5518817919367211668766568069591515221233174524575627655885217399) , XLF(-0.4309396531921752276110843193597059893489616747715943193038832713) , XLF(-0.3331072348755986375573819476643568411736978618085493378158771037) , XLF(-0.2539385772414767539599720891513983314085613177649191147268291957) , XLF(-0.1900667038736958640880944899109356184028768960348825827085321728) , XLF(-0.1389010462253641359590036042388613572070161268938288702402679418) , XLF(-0.09841117686633293944432583376322831792096348808502462072703927122) , XLF(-0.06696674766202966506890676491489340355301803471998128364424764985) , XLF(-0.04321391826377224185679885786371453233952356822996869493387370648) , XLF(-0.02597561948079389846050245809589723368039309202890154753255867357) , XLF(-0.01416888539677724491675220290304631912898640250999730674086817242) , XLF(-0.006739186449046984038080911435844015053701108547912994938306338808) , XLF(-0.00262104180821929193652757454556567953998534980504767823799715708) , XLF(-0.0007461328733157922833797184262927803175879006213308751009054405432) , XLF(-0.0001261350888067020144430598696989766129159126927479792117631999536) , XLF(-8.093502330497814676653757956531789181149791636306257030476742624e-06) , XLF(-6.078089086680648675357094830545843973084443584869234951774044295e-08) , XLF(-4.547507808420938896720625846449621216033643220349188723154739851e-13) , } ; const xlf_type Poles_42[] = { XLF(-0.8915320268698522159651647309409898698731488935948280938605312513) , XLF(-0.7077435357376586388858410940687095792858920465042904375498401819) , XLF(-0.5597407280298025835090097476707528895889058210710629541672853992) , XLF(-0.4398506928520543702522960970484893773080325900543098559008872809) , XLF(-0.3424044791801646434681043343247679811771476619885264677273003911) , XLF(-0.2631497229248989413568644924188030250797685093594147906749133287) , XLF(-0.1988500723607262106739088511038080175064902350929060755680797828) , XLF(-0.1470053478138674280323752622112899295178614489003852780125198287) , XLF(-0.1056507102479482771048677935189569652285606137594146388606492182) , XLF(-0.0732077991194162305895413659749245559880295171793736680711999188) , XLF(-0.04836997283028642285462458648811236312434105938659972391425244394) , XLF(-0.0300099472525122720946746368368032003662655931894695709023281059) , XLF(-0.01710304346825772288069577457560739794564483583185968574040843086) , XLF(-0.008664489321413533350337112095657616565847803017696510225767787322) , XLF(-0.003706304925360371606803185866509402690510381270201288741985858992) , XLF(-0.001228913565528939125078519304902651811485016604792305852587191118) , XLF(-0.0002708325035954437436783014078654001809518243058307399683824956403) , XLF(-2.917476017494444073456549303554553293863110238727813211098795334e-05) , XLF(-7.498609369472629308544215472257229740522463123374387877904055657e-07) , XLF(-4.815479655866042128205442074833506699433411427882331039523924082e-10) , XLF(-9.139181489029075064260505998730771341799704346795797559508345866e-21) , } ; const xlf_type Poles_43[] = { XLF(-0.893863440867893358608828901893563497810908751596147384916450371) , XLF(-0.7133692078362500762254293013499779757058403109135527663592579498) , XLF(-0.5673347981712786697944014046050991485878993432456727187652533242) , XLF(-0.4485012929513466523033762513202365413857949704604126689088403276) , XLF(-0.3514729825996737699863520720668652637190478996596308842905246751) , XLF(-0.2721803762296094398715259352720103937021937668858261345535146447) , XLF(-0.2075114001378054455728203676821808935160404127100836552303700095) , XLF(-0.1550522574708870535187643983914013179243763048323631791625734408) , XLF(-0.1129006825209351018462604631594150605652417220648747297441159382) , XLF(-0.07952726494797824527452900402745299894178991587219038642874328136) , XLF(-0.0536686430180897847565647778831917630437506181627560030120559346) , XLF(-0.03424178287301239611979002915701822580782004283452107789848164312) , XLF(-0.02027272969155223317872331706788925401264853242898531741842110359) , XLF(-0.0108373831470151738767331253445189914819894105048839802154604759) , XLF(-0.005017257442672953433759649112858697950215891223015279326312800931) , XLF(-0.001880716222005879490408415736977615798879384426669420484772851697) , XLF(-0.0005082812655905478544414023088409706846704423690510655413901375747) , XLF(-7.956174663838349201171982112646408318851297885982006145797839986e-05) , XLF(-4.513325250611668622670878692265663961000697908990863106300239453e-06) , XLF(-2.696759817112634783679825031765393335855971365215612678489021744e-08) , XLF(-1.136873200559593259398193993017687665952388539500605071159134504e-13) , } ; const xlf_type Poles_44[] = { XLF(-0.8960968028992146665627409301695607302205014963225443894685342127) , XLF(-0.7187837773708819352136947096079319017090596410656272550377272326) , XLF(-0.5746767631908139827528406251451784714430500492740761209250364974) , XLF(-0.4569016943466847335506940555269725780044767461717720240889695593) , XLF(-0.3603190713769044571789712579438782147170830151763222752366173198) , XLF(-0.2810322569159787547930034286249012421326484907559484425330827152) , XLF(-0.2160474721108212555687295004555083560208818252478624520061428472) , XLF(-0.1630335348215804648625873336213043131376530422779831554484065536) , XLF(-0.1201479953744980079922840269252211154399742809514526337471659156) , XLF(-0.08590775414808882023294983578453695403030749604255062341580882368) , XLF(-0.05908945890328844788754841773552247793962609244906350553590671065) , XLF(-0.03864977109894073793200650011114251143377588126854134896148539826) , XLF(-0.02365917402130976673278605911930246830033721946504547153497426121) , XLF(-0.01324637525169005773742919987891981609058182960094684094561128642) , XLF(-0.00655452780820816870084755830167570427328585406514796089829324295) , XLF(-0.002716322197289291416048524927342344422154897427943894833746072091) , XLF(-0.0008621057366132465648673765980953159889884901490970504618348817264) , XLF(-0.0001784049129824874983750749912243393871433082536567270149530538835) , XLF(-1.745255401918100730688816665717770286170547015610198053753240676e-05) , XLF(-3.808577626111630594159094735088815707265727307233566729871581692e-07) , XLF(-1.733235166498085104752211857283419959035264593098427257293460137e-10) , XLF(-1.015464609878786614079491151475195110882861668111464956725349865e-21) , } ; const xlf_type Poles_45[] = { XLF(-0.898238167312102371349291053804821640312192793555562471215476875) , XLF(-0.7239988194534050450565699116216409286927740874829315191220161918) , XLF(-0.5817786139867798902425713772156262872378141049843191723750717028) , XLF(-0.4650617070250951283723883102723573446594725565665736785548979181) , XLF(-0.368949061752944921465166012473919422035541674539231445258506868) , XLF(-0.28970752546932497793372776574871614478649833989202096321840715) , XLF(-0.2244559694078480600273876525332568970050856529133041456485020925) , XLF(-0.1709422714709099403943210263664855143389875812409323306553227399) , XLF(-0.127381255033133456657313390242003823329182271006023449160273537) , XLF(-0.09233380888114754628108377637591134087471387005282428247572939459) , XLF(-0.06461383375765931115919652801084614876607520051980950747351586994) , XLF(-0.0432139182637722494695217347313226126861928859040447373499280064) , XLF(-0.02724376234303231214714221804415821981624772806286889997360655287) , XLF(-0.01587815723560460470494461070583947317801425002006642339597852799) , XLF(-0.008314592328776581400858429423285613326485676365826471754960988952) , XLF(-0.003745054905481998368479047563539474655222912170792398409464371879) , XLF(-0.001352465418015474711809167945824158348426517243049641995552176944) , XLF(-0.0003468798403437999650432616403275257135294579726410741017702179129) , XLF(-5.025781166635491597024308150332417618545992094836478580674841117e-05) , XLF(-2.51960760849922105778664437501872178787878497284421041128478423e-06) , XLF(-1.196948365983150080846074019812402782497101718879224902675681104e-08) , XLF(-2.84217772584202427130585839790790701423165122912176737813096462e-14) , } ; const xlf_type* const precomputed_poles[] = { 0, 0, Poles_2, Poles_3, Poles_4, Poles_5, Poles_6, Poles_7, Poles_8, Poles_9, Poles_10, Poles_11, Poles_12, Poles_13, Poles_14, Poles_15, Poles_16, Poles_17, Poles_18, Poles_19, Poles_20, Poles_21, Poles_22, Poles_23, Poles_24, Poles_25, Poles_26, Poles_27, Poles_28, Poles_29, Poles_30, Poles_31, Poles_32, Poles_33, Poles_34, Poles_35, Poles_36, Poles_37, Poles_38, Poles_39, Poles_40, Poles_41, Poles_42, Poles_43, Poles_44, Poles_45, } ; const xlf_type* const precomputed_basis_function_values[] = { K0, K1, K2, K3, K4, K5, K6, K7, K8, K9, K10, K11, K12, K13, K14, K15, K16, K17, K18, K19, K20, K21, K22, K23, K24, K25, K26, K27, K28, K29, K30, K31, K32, K33, K34, K35, K36, K37, K38, K39, K40, K41, K42, K43, K44, K45, } ; } ; // end of namespace vspline_constants #define VSPLINE_POLES_H #endif kfj-vspline-4b365417c271/prefilter.h000066400000000000000000001162621333775006700170700ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file prefilter.h \brief Code to create the coefficient array for a b-spline. Note: the bulk of the code was factored out to filter.h, while this text still outlines the complete filtering process. B-spline coefficients can be generated in two ways (that I know of): the first is by solving a set of equations which encode the constraints of the spline. A good example of how this is done can be found in libeinspline. I term it the 'linear algebra approach'. In this implementation, I have chosen what I call the 'DSP approach'. In a nutshell, the DSP approach looks at the b-spline's reconstruction as a convolution of the coefficients with a specific kernel. This kernel acts as a low-pass filter. To counteract the effect of this filter and obtain the input signal from the convolution of the coefficients, a high-pass filter with the inverse transfer function to the low-pass is used. This high-pass has infinite support, but can still be calculated precisely within the bounds of the arithmetic precision the CPU offers, due to the properties it has. I recommend [CIT2000] for a formal explanation. At the core of my prefiltering routines there is code from Philippe Thevenaz' accompanying code to this paper, with slight modifications translating it to C++ and making it generic. The greater part of this file deals with 'generifying' the process and to employing multithreading and the CPU's vector units to gain speed. This code makes heavy use of vigra, which provides handling of multidimensional arrays and efficient handling of aggregate types - to only mention two of it's many qualities. Explicit vectorization is done with Vc, which allowed me to code the horizontal vectorization I use in a generic fashion. If Vc is not available, the code falls back to presenting the data so that autovectorization becomes very likely - a technique I call 'goading'. In another version of this code I used vigra's BSplineBase class to obtain prefilter poles. This required passing the spline degree/order as a template parameter. Doing it like this allows to make the Poles static members of the solver, but at the cost of type proliferation. Here I chose not to follow this path and pass the spline order as a parameter to the spline's constructor, thus reducing the number of solver specializations and allowing automated testing with loops over the degree. This variant may be slightly slower. The prefilter poles I use are precalculated externally with gsl/blas and polished in high precision to provide the most precise data possible. this avoids using vigra's polynomial root code which failed for high degrees when I used it. [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000, */ #ifndef VSPLINE_PREFILTER_H #define VSPLINE_PREFILTER_H #include #include "common.h" #include "poles.h" #include "filter.h" namespace vspline { using namespace std ; using namespace vigra ; /// overall_gain is a helper routine: /// Simply executing the filtering code by itself will attenuate the signal. Here /// we calculate the gain which, pre-applied to the signal, will cancel this effect. /// While this code was initially part of the filter's constructor, I took it out /// to gain some flexibility by passing in the gain as a parameter. /// /// Note that higher-degree splines need filtering with some poles which are *very* /// small numerically. This is a problem: The data get 'squashed', since there are /// mathematical operations between attenuated and unattenuated values. So for high /// spline degrees, float data aren't suitable, and even doubles and long doubles /// suffer from squashing and lose precision. /// /// Note also how we perform the arithmetics in this routine in the highest precision /// available. Calling code will cast the product down to the type it uses for maths. static xlf_type overall_gain ( const int & nbpoles , const xlf_type * const pole ) { xlf_type lambda = 1 ; for ( int k = 0 ; k < nbpoles ; k++ ) lambda *= ( 1 - pole[k] ) * ( 1 - 1 / pole[k] ) ; return lambda ; } /// overload of overall_gain taking the spline's degree static xlf_type overall_gain ( const int & spline_degree ) { if ( spline_degree < 2 ) return 1 ; assert ( spline_degree <= vspline_constants::max_degree ) ; return overall_gain ( spline_degree / 2 , vspline_constants::precomputed_poles [ spline_degree ] ) ; } /// structure to hold specifications for an iir_filter object. /// This set of parameters has to be passed through from /// the calling code through the multithreading code to the worker threads /// where the filter objects are finally constructed. Rather than passing /// the parameters via some variadic mechanism, it's more concise and /// expressive to contain them in a structure and pass that around. /// The filter itself inherits its specification type, and if the code /// knows the handler's type, it can derive the spec type. This way the /// argument passing can be formalized, allowing for uniform handling of /// several different filter types with the same code. Here we have the /// concrete parameter set needed for b-spline prefiltering. We'll pass /// one set of 'specs' per axis; it contains: /// - the boundary condition for this axis /// - the number of filter poles (see poles.h) /// - a pointer to npoles poles /// - the acceptable tolerance // TODO: KFJ 2018-03-21 added another member 'boost' to the filter specs. // This value is used as a factor on 'gain', resulting in the signal // being amplified by this factor at no additional computational cost, // which might be desirable when pulling integral signals up to the // maximal dynamic range. But beware: there are some corner cases with // splines holding integral data which may cause wrong results // if 'boost' is too large. Have a look at int_spline.cc and also the // comments above _process_1d in filter.h struct iir_filter_specs { vspline::bc_code bc ; int npoles ; const xlf_type * pole ; xlf_type tolerance ; xlf_type boost ; iir_filter_specs ( vspline::bc_code _bc , int _npoles , const xlf_type * _pole , xlf_type _tolerance , xlf_type _boost = xlf_type ( 1 ) ) : bc ( _bc ) , npoles ( _npoles ) , pole ( _pole ) , tolerance ( _tolerance ) , boost ( _boost ) { } ; } ; /// class iir_filter implements an n-pole forward/backward recursive filter /// to be used for b-spline prefiltering. It inherits from the 'specs' /// class for easy initialization. template < typename in_type , typename out_type = in_type , typename _math_type = out_type > class iir_filter : public iir_filter_specs { typedef _math_type math_type ; typedef vigra::MultiArrayView < 1 , in_type > in_buffer_type ; typedef vigra::MultiArrayView < 1 , out_type > out_buffer_type ; /// typedef the fully qualified type for brevity, to make the typedefs below /// more legible typedef iir_filter < in_type , out_type , math_type > filter_type ; xlf_type gain ; std::vector < int > horizon ; // we handle the polymorphism internally, working with method pointers. // this saves us having to set up a base class with virtual member functions // and inheriting from it. typedef void ( filter_type::*p_solve ) ( const in_buffer_type & input , out_buffer_type & output ) const ; typedef math_type ( filter_type::*p_icc ) ( const in_buffer_type & buffer , int k ) const ; typedef math_type ( filter_type::*p_iccx ) ( const out_buffer_type & buffer , int k ) const ; typedef math_type ( filter_type::*p_iacc ) ( const out_buffer_type & buffer , int k ) const ; // these are the method pointers used: p_solve _p_solve ; ///< pointer to the solve method p_icc _p_icc ; ///< pointer to calculation of initial causal coefficient (from in_) p_iccx _p_iccx ; ///< pointer to calculation of initial causal coefficient (from out_) p_iacc _p_iacc ; ///< pointer to calculation of initial anticausal coefficient public: // this filter runs over the data several times and stores the result // of each run back to be picked up by the next run. This has certain // implications: if out_type is an integral type, using it to store // intermediates will produce quantization errors with every run. // this flag signals to the wielding code in filter.h that intermediates // need to be stored, so it can avoid the problem by providing a buffer // in a 'better' type as output ('output' is used to store intermediates) // and converting the data back to the 'real' output afterwards. static const bool is_single_pass { false } ; /// calling code may have to set up buffers with additional /// space around the actual data to allow filtering code to /// 'run up' to the data, shedding margin effects in the /// process. For an IIR filter, this is theoretically /// infinite , but since we usually work to a specified precision, /// we can pass 'horizon' - horizon[0] containing the largest /// of the horizon values. int get_support_width ( ) const { if ( npoles ) return horizon [ 0 ] ; // TODO quick fix. I think this case never occurs, since the filtering // code is avoided for npoles < 1 return 64 ; } /// solve() takes two buffers, one to the input data and one to the output space. /// The containers must have the same size. It's safe to use solve() in-place. void solve ( const in_buffer_type & input , out_buffer_type & output ) { assert ( input.size ( ) == output.size ( ) ) ; ( this->*_p_solve ) ( input , output ) ; } /// for in-place operation we use the same filter routine. void solve ( out_buffer_type & data ) { ( this->*_p_solve ) ( data , data ) ; } // I use adapted versions of P. Thevenaz' code to calculate the initial causal and // anticausal coefficients for the filter. The code is changed just a little to work // with an iterator instead of a C vector. private: /// The code for mirrored BCs is adapted from P. Thevenaz' code, the other routines are my /// own doing, with aid from a digest of spline formulae I received from P. Thevenaz and which /// were helpful to verify the code against a trusted source. /// /// note how, in the routines to find the initial causal coefficient, there are two different /// cases: first the 'accelerated loop', which is used when the theoretically infinite sum of /// terms has reached sufficient precision , and the 'full loop', which implements the mathematically /// precise representation of the limes of the infinite sum towards an infinite number of terms, /// which happens to be calculable due to the fact that the absolute value of all poles is < 1 and /// /// lim n a /// sum a * q ^ k = --- /// n->inf k=0 1-q /// /// first are mirror BCs. This is mirroring 'on bounds', /// f ( -x ) == f ( x ) and f ( n-1 - x ) == f (n-1 + x) /// /// note how mirror BCs are equivalent to requiring the first derivative to be zero in the /// linear algebra approach. Obviously with mirrored data this has to be the case, the location /// where mirroring occurs is always an extremum. So this case covers 'FLAT' BCs as well /// /// the initial causal coefficient routines are templated by buffer type, because depending /// on the circumstances, they may be used either on the input or the output. // TODO format to vspline standard /// we use accessor classes to access the input and output buffers. /// To access an input buffer (which remains constant), we use /// 'as_math_type' which simply provides the ith element cast to /// math_type. This makes for legible, concise code. We return /// const math_type from operator[] to make sure X[..] won't be /// accidentally assigned to. template < typename buffer_type > struct as_math_type { const buffer_type & c ; as_math_type ( const buffer_type & _c ) : c ( _c ) { } ; const math_type operator[] ( int i ) const { return math_type ( c [ i ] ) ; } } ; /// the second helper class, as_target, is meant for output /// buffers. Here we need to read as well as write. Writing is /// rare, so I use a method 'store' in preference to doing artistry /// with a proxy. We return const math_type from operator[] to make /// sure X[..] won't be accidentally assigned to. template < typename buffer_type > struct as_target { buffer_type & x ; as_target ( buffer_type & _x ) : x ( _x ) { } ; const math_type operator[] ( int i ) const { return math_type ( x [ i ] ) ; } void store ( const math_type & v , const int & i ) { x [ i ] = typename buffer_type::value_type ( v ) ; } } ; template < class buffer_type > math_type icc_mirror ( const buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; math_type zn , z2n , iz ; math_type Sum ; int n ; if ( horizon[k] < M ) { /* accelerated loop */ zn = z ; Sum = c[0] ; for ( n = 1 ; n < horizon[k] ; n++ ) { Sum += zn * c[n] ; zn *= z ; } } else { /* full loop */ zn = z ; iz = math_type ( 1.0 ) / z ; z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( M - 1 ) ) ) ; Sum = c[0] + z2n * c[M - 1] ; z2n *= z2n * iz ; for ( n = 1 ; n <= M - 2 ; n++ ) { Sum += ( zn + z2n ) * c[n] ; zn *= z ; z2n *= iz ; } Sum /= ( math_type ( 1.0 ) - zn * zn ) ; } return ( Sum ) ; } /// the initial anticausal coefficient routines are always called with the output buffer, /// so they needn't be templated like the icc routines. /// /// I still haven't understood the 'magic' which allows to calculate the initial anticausal /// coefficient from just two results of the causal filter, but I assume it's some exploitation /// of the symmetry of the data. This code is adapted from P. Thevenaz'. math_type iacc_mirror ( const out_buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < out_buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; return ( math_type ( z / ( z * z - math_type ( 1.0 ) ) ) * ( c [ M - 1 ] + z * c [ M - 2 ] ) ) ; } /// next are 'antimirrored' BCs. This is the same as 'natural' BCs: the signal is /// extrapolated via point mirroring at the ends, resulting in point-symmetry at the ends, /// which is equivalent to the second derivative being zero, the constraint used in /// the linear algebra approach to calculate 'natural' BCs: /// /// f ( x ) - f ( 0 ) == f ( 0 ) - f ( -x ) ; /// f ( x+n-1 ) - f ( n-1 ) == f ( n-1 ) - f (n-1-x) template < class buffer_type > math_type icc_natural ( const buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; math_type zn , z2n , iz ; math_type Sum , c02 ; int n ; // f ( x ) - f ( 0 ) == f ( 0 ) - f (-x) // f ( -x ) == 2 * f ( 0 ) - f (x) if ( horizon[k] < M ) { c02 = c[0] + c[0] ; zn = z ; Sum = c[0] ; for ( n = 1 ; n < horizon[k] ; n++ ) { Sum += zn * ( c02 - c[n] ) ; zn *= z ; } return ( Sum ) ; } else { zn = z ; iz = math_type ( 1.0 ) / z ; z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( M - 1 )) ) ; Sum = math_type ( ( math_type ( 1.0 ) + z ) / ( math_type ( 1.0 ) - z ) ) * ( c[0] - z2n * c[M - 1] ) ; z2n *= z2n * iz ; // z2n == z^2M-3 for ( n = 1 ; n <= M - 2 ; n++ ) { Sum -= ( zn - z2n ) * c[n] ; zn *= z ; z2n *= iz ; } return ( Sum / ( math_type ( 1.0 ) - zn * zn )) ; } } /// I still haven't understood the 'magic' which allows to calculate the initial anticausal /// coefficient from just two results of the causal filter, but I assume it's some exploitation /// of the symmetry of the data. This code is adapted from P. Thevenaz' formula. math_type iacc_natural ( const out_buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < out_buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; return - math_type ( z / ( ( math_type ( 1.0 ) - z ) * ( math_type ( 1.0 ) - z ) ) ) * ( c [ M - 1 ] - z * c [ M - 2 ] ) ; } /// next are reflective BCs. This is mirroring 'between bounds': /// /// f ( -1 - x ) == f ( x ) and f ( n + x ) == f (n-1 - x) /// /// I took Thevenaz' routine for mirrored data as a template and adapted it. /// 'reflective' BCs have some nice properties which make them more suited than mirror BCs in /// some situations: /// - the artificial discontinuity is 'pushed out' half a unit spacing /// - the extrapolated data are just as long as the source data /// - they play well with even splines template < class buffer_type > math_type icc_reflect ( const buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; math_type zn , z2n , iz ; math_type Sum ; int n ; if ( horizon[k] < M ) { zn = z ; Sum = c[0] ; for ( n = 0 ; n < horizon[k] ; n++ ) { Sum += zn * c[n] ; zn *= z ; } return ( Sum ) ; } else { zn = z ; iz = math_type ( 1.0 ) / z ; z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( 2 * M )) ) ; Sum = 0 ; for ( n = 0 ; n < M - 1 ; n++ ) { Sum += ( zn + z2n ) * c[n] ; zn *= z ; z2n *= iz ; } Sum += ( zn + z2n ) * c[n] ; return c[0] + Sum / ( math_type ( 1.0 ) - zn * zn ) ; } } /// I still haven't understood the 'magic' which allows to calculate the initial anticausal /// coefficient from just one result of the causal filter, but I assume it's some exploitation /// of the symmetry of the data. I have to thank P. Thevenaz for his formula which let me code: math_type iacc_reflect ( const out_buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < out_buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; return c[M - 1] / ( math_type ( 1.0 ) - math_type ( 1.0 ) / z ) ; } /// next is periodic BCs. so, f ( x ) = f (x+N) /// /// Implementing this is more straightforward than implementing the various mirrored types. /// The mirrored types are, in fact, also periodic, but with a period twice as large, since they /// repeat only after the first reflection. So especially the code for the full loop is more complex /// for mirrored types. The down side here is the lack of symmetry to exploit, which made me code /// a loop for the initial anticausal coefficient as well. template < class buffer_type > math_type icc_periodic ( const buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; math_type zn ; math_type Sum ; int n ; if ( horizon[k] < M ) { zn = z ; Sum = c[0] ; for ( n = M - 1 ; n > ( M - horizon[k] ) ; n-- ) { Sum += zn * c[n] ; zn *= z ; } } else { zn = z ; Sum = c[0] ; for ( n = M - 1 ; n > 0 ; n-- ) { Sum += zn * c[n] ; zn *= z ; } Sum /= ( math_type ( 1.0 ) - zn ) ; } return Sum ; } math_type iacc_periodic ( const out_buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < out_buffer_type > c ( _c ) ; math_type z = math_type ( pole[k] ) ; math_type zn ; math_type Sum ; if ( horizon[k] < M ) { zn = z ; Sum = c[M-1] * z ; for ( int n = 0 ; n < horizon[k] ; n++ ) { zn *= z ; Sum += zn * c[n] ; } Sum = -Sum ; } else { zn = z ; Sum = c[M-1] ; for ( int n = 0 ; n < M - 1 ; n++ ) { Sum += zn * c[n] ; zn *= z ; } Sum = z * Sum / ( zn - math_type ( 1.0 ) ) ; } return Sum ; } /// guess the initial coefficient. This tries to minimize the effect /// of starting out with a hard discontinuity as it occurs with zero-padding, /// while at the same time requiring little arithmetic effort /// /// for the forward filter, we guess an extrapolation of the signal to the left /// repeating c[0] indefinitely, which is cheap to compute: template < class buffer_type > math_type icc_guess ( const buffer_type & _c , int k ) const { as_math_type < buffer_type > c ( _c ) ; return c[0] * math_type ( 1.0 / ( 1.0 - pole[k] ) ) ; } // for the backward filter , we assume mirror BC, which is also cheap to compute: math_type iacc_guess ( const out_buffer_type & c , int k ) const { return iacc_mirror ( c , k ) ; } template < class buffer_type > math_type icc_identity ( const buffer_type & _c , int k ) const { as_math_type < buffer_type > c ( _c ) ; return c[0] ; } math_type iacc_identity ( const out_buffer_type & _c , int k ) const { int M = _c.size ( ) ; as_math_type < out_buffer_type > c ( _c ) ; return c[M-1] ; } /// now we come to the solving, or prefiltering code itself. /// The code is adapted from P. Thevenaz' code. /// /// I use a 'carry' element, 'X', to carry the result of the recursion /// from one iteration to the next instead of using the direct implementation /// of the recursion formula, which would read the previous value of the /// recursion from memory by accessing x[n-1], or x[n+1], respectively. void solve_gain_inlined ( const in_buffer_type & _c , out_buffer_type & _x ) const { int M = _c.size ( ) ; assert ( _x.size ( ) == M ) ; as_math_type < in_buffer_type > c ( _c ) ; as_target < out_buffer_type > x ( _x ) ; if ( M == 1 ) { x.store ( c[0] , 0 ) ; return ; } assert ( M > 1 ) ; // use a buffer of one math_type for the recursion (see below) math_type X ; math_type p = math_type ( pole[0] ) ; // use first filter pole, applying overall gain in the process // of consuming the input. // Note that the application of the gain is performed during the processing // of the first (maybe the only) pole of the filter, instead of running a separate // loop over the input to apply it before processing starts. // note how the gain is applied to the initial causal coefficient. This is // equivalent to first applying the gain to the input and then calculating // the initial causal coefficient from the processed input. X = math_type ( gain ) * ( this->*_p_icc ) ( _c , 0 ) ; x.store ( X , 0 ) ; /* causal recursion */ // the gain is applied to each input value as it is consumed for ( int n = 1 ; n < M ; n++ ) { X = math_type ( gain ) * c[n] + p * X ; x.store ( X , n ) ; } // now the input is used up and won't be looked at any more; all subsequent // processing operates on the output. /* anticausal initialization */ X = ( this->*_p_iacc ) ( _x , 0 ) ; x.store ( X , M - 1 ) ; /* anticausal recursion */ for ( int n = M - 2 ; 0 <= n ; n-- ) { X = p * ( X - x[n] ) ; x.store ( X , n ) ; } // for the remaining poles, if any, don't apply the gain // and process the result from applying the first pole for ( int k = 1 ; k < npoles ; k++ ) { p = math_type ( pole[k] ) ; /* causal initialization */ X = ( this->*_p_iccx ) ( _x , k ) ; x.store ( X , 0 ) ; /* causal recursion */ for ( int n = 1 ; n < M ; n++ ) { X = x[n] + p * X ; x.store ( X , n ) ; } /* anticausal initialization */ X = ( this->*_p_iacc ) ( _x , k ) ; x.store ( X , M - 1 ) ; /* anticausal recursion */ for ( int n = M - 2 ; 0 <= n ; n-- ) { X = p * ( X - x[n] ) ; x.store ( X , n ) ; } } } /// solve_identity is used for spline degrees 0 and 1. In this case /// there are no poles to apply, but if the operation is not in-place /// and/or there is a 'boost' factor which is different from 1, the /// data are copied and/or amplified with 'boost'. void solve_identity ( const in_buffer_type & _c , out_buffer_type & _x ) const { int M = _c.size ( ) ; assert ( _x.size ( ) == M ) ; as_math_type < in_buffer_type > c ( _c ) ; as_target < out_buffer_type > x ( _x ) ; if ( boost == xlf_type ( 1 ) ) { // boost is 1, check if operation is not in-place if ( ( void* ) ( _c.data ( ) ) != ( void* ) ( _x.data ( ) ) ) { // operation is not in-place, copy input to output for ( int n = 0 ; n < M ; n++ ) { x.store ( c[n] , n ) ; } } } else { // we have a boost factor, so we apply it. math_type factor = math_type ( boost ) ; for ( int n = 0 ; n < M ; n++ ) { x.store ( factor * c[n] , n ) ; } } } /// The last bit of work left is the constructor. This simply passes /// the specs to the base class constructor, as iir_filter inherits /// from the specs type. public: iir_filter ( const iir_filter_specs & specs ) : iir_filter_specs ( specs ) { // TODO we have a problem if the gain is getting very large, as it happens // for high spline degrees. The iir_filter attenuates the signal to next-to-nothing, // then it's amplified back to the previous amplitude. This degrades the signal, // most noticeably when the numeric type is lo-fi, since there are operations involving // both the attenuated and unattenuated data ('squashing'). if ( npoles < 1 ) { // zero poles means there's nothing to do but possibly // copying the input to the output, which solve_identity // will do if the operation isn't in-place. _p_solve = & filter_type::solve_identity ; return ; } // calculate the horizon for each pole, this is the number of iterations // the filter must perform on a unit pulse to decay below 'tolerance' // If tolerance is 0 (or negative) we set 'horizon' to MAX_INT. This // will have the effect of making it larger than M, or at least so // large that there won't be a difference between the accelerated and // the full loop. We might use a smaller value which still guarantees // the complete decay. for ( int i = 0 ; i < npoles ; i++ ) { if ( tolerance > 0 ) horizon.push_back ( ceil ( log ( tolerance ) / log ( std::abs ( pole[i] ) ) ) ) ; else horizon.push_back ( INT_MAX ) ; // TODO quick fix, think about it } // contrary to my initial implementation I use per-axis gain instead of // cumulating the gain for all axes. This may perform slightly worse, but // is more stable numerically and simplifies the code. gain = boost * vspline::overall_gain ( npoles , pole ) ; _p_solve = & filter_type::solve_gain_inlined ; // while the forward/backward IIR iir_filter in the solve_... routines is the same for all // boundary conditions, the calculation of the initial causal and anticausal coefficients // depends on the boundary conditions and is handled by a call through a method pointer // in the solve_... routines. Here we fix these method pointers: if ( bc == MIRROR ) { _p_icc = & filter_type::icc_mirror ; _p_iccx = & filter_type::icc_mirror ; _p_iacc = & filter_type::iacc_mirror ; } else if ( bc == NATURAL ) { _p_icc = & filter_type::icc_natural ; _p_iccx = & filter_type::icc_natural ; _p_iacc = & filter_type::iacc_natural ; } else if ( bc == PERIODIC ) { _p_icc = & filter_type::icc_periodic ; _p_iccx = & filter_type::icc_periodic ; _p_iacc = & filter_type::iacc_periodic ; } else if ( bc == REFLECT ) { _p_icc = & filter_type::icc_reflect ; _p_iccx = & filter_type::icc_reflect ; _p_iacc = & filter_type::iacc_reflect ; } else if ( bc == ZEROPAD ) { _p_icc = & filter_type::icc_identity ; _p_iccx = & filter_type::icc_identity ; _p_iacc = & filter_type::iacc_identity ; } else if ( bc == GUESS ) { _p_icc = & filter_type::icc_guess ; _p_iccx = & filter_type::icc_guess ; _p_iacc = & filter_type::iacc_guess ; } else { throw not_supported ( "boundary condition not supported by vspline::filter" ) ; } } } ; // end of class iir_filter /// class to provide b-spline prefiltering, using 'iir_filter' above. /// The actual filter object has to interface with the data handling /// routine ('present', see filter.h). So this class functions as an /// adapter, combining the code needed to set up adequate buffers /// and creation of the actual IIR filter itself. /// The interface to the data handling routine is provided by /// inheriting from class buffer_handling template < template < typename , size_t > class _vtype , typename _math_ele_type , size_t _vsize > struct bspl_prefilter : public buffer_handling < _vtype , _math_ele_type , _vsize > , public vspline::iir_filter < _vtype < _math_ele_type , _vsize > > { // provide this type for queries typedef _math_ele_type math_ele_type ; // we'll use a few types from the buffer_handling type typedef buffer_handling < _vtype , _math_ele_type , _vsize > buffer_handling_type ; using typename buffer_handling_type::vtype ; using buffer_handling_type::vsize ; using buffer_handling_type::init ; // instances of class bspl_prefilter hold the buffer: using allocator_t = typename vspline::allocator_traits < vtype > :: type ; vigra::MultiArray < 1 , vtype , allocator_t > buffer ; // the filter's 'solve' routine has the workhorse code to filter // the data inside the buffer: typedef _vtype < _math_ele_type , _vsize > simdized_math_type ; typedef vspline::iir_filter < simdized_math_type > filter_type ; using filter_type::solve ; // by defining arg_type, we allow code to infer what type of // argument initializer the filter takes typedef iir_filter_specs arg_type ; // the constructor invokes the filter's constructor, // sets up the buffer and initializes the buffer_handling // component to use the whole buffer to accept incoming and // provide outgoing data. bspl_prefilter ( const iir_filter_specs & specs , size_t size ) : filter_type ( specs ) , buffer ( size ) { // operate in-place and use the whole buffer to receive and // deliver data init ( buffer , buffer ) ; } ; // operator() simply delegates to the filter's 'solve' routine, // which filters the data in the buffer. void operator() ( ) { solve ( buffer , buffer ) ; } // factory function to provide a filter with the same set of // parameters, but possibly different data types. this is used // for processing of 1D data, where the normal buffering mechanism // may be sidestepped template < typename in_type , typename out_type = in_type , typename math_type = out_type > static vspline::iir_filter < in_type , out_type , math_type > get_raw_filter ( const iir_filter_specs & specs ) { return vspline::iir_filter < in_type , out_type , math_type > ( specs ) ; } } ; /// amplify is used to copy input to output, optionally applying /// 'boost' in the process. If the operation is in-place and 'boost' /// is 1, 'amplify' returns prematurely. template < unsigned int dimension , typename in_value_type , typename out_value_type , typename math_ele_type > void amplify ( vspline::shape_range_type < dimension > range , const vigra::MultiArrayView < dimension , in_value_type > * p_input , vigra::MultiArrayView < dimension , out_value_type > * p_output , math_ele_type boost = 1 , int njobs = vspline::default_njobs ) { // if the operation is in-place and boost is 1, // there is nothing to do. if ( (void*) ( p_input->data() ) == (void*) ( p_output->data() ) && boost == math_ele_type ( 1 ) ) return ; if ( njobs == 1 ) { // njobs == 1 means we're in the single worker thread. // Here we actually perform the task at hand. const auto input = p_input->subarray ( range[0] , range[1] ) ; auto output = p_output->subarray ( range[0] , range[1] ) ; auto src = input.begin() ; if ( boost == math_ele_type ( 1 ) ) { for ( auto & trg : output ) { trg = out_value_type ( *src ) ; ++src ; } } else { for ( auto & trg : output ) { trg = out_value_type ( boost * *src ) ; ++src ; } } } else { // njobs > 1 means we've just been called from 'outside', // so we partition the 'full' range we've received and use // 'multithread' to indirectly recurse. auto partitioning = vspline::partition_to_stripes ( range , njobs ) ; auto recall = & ( amplify < dimension , in_value_type , out_value_type , math_ele_type > ) ; vspline::multithread ( recall , partitioning , p_input , p_output , boost , 1 ) ; } } /// 'prefilter' handles b-spline prefiltering for the whole range of /// acceptable input and output. It combines two bodies of code to /// achieve this goal: /// - the b-spline filtering code above /// - 'wielding' code in filter.h, which is not specific to b-splines. /// /// Note that vsize , the vectorization width, can be passed explicitly. /// If Vc is in use and math_ele_type can be used with hardware /// vectorization, the arithmetic will be done with Vc::SimdArrays /// of the given size. Otherwise 'goading' will be used: the data are /// presented in TinyVectors of vsize math_ele_type, hoping that the /// compiler may autovectorize the operation. template < unsigned int dimension , typename in_value_type , typename out_value_type , typename math_ele_type , size_t vsize = vspline::vector_traits::size > void prefilter ( const vigra::MultiArrayView < dimension , in_value_type > & input , vigra::MultiArrayView < dimension , out_value_type > & output , vigra::TinyVector < bc_code , dimension > bcv , int degree , xlf_type tolerance , xlf_type boost = xlf_type ( 1 ) , int njobs = default_njobs ) { if ( degree <= 1 ) { // if degree is <= 1, there is no filter to apply, but we may need // to apply 'boost' and/or copy input to output. We use 'amplify' // for the purpose, which multithreads the operation (if it is at // all necessary). I found this is (slightly) faster than doing the // job in a single thread - the process is mainly memory-bound, so // the gain is moderate. auto full_range = vspline::shape_range_type < dimension > ( vspline::shape_type < dimension >() , input.shape() ) ; amplify < dimension , in_value_type , out_value_type , math_ele_type > ( full_range , &input , &output , math_ele_type ( boost ) ) ; return ; } std::vector < vspline::iir_filter_specs > vspecs ; // package the arguments to the filter; one set of arguments // per axis of the data auto poles = vspline_constants::precomputed_poles [ degree ] ; for ( int axis = 0 ; axis < dimension ; axis++ ) { vspecs.push_back ( vspline::iir_filter_specs ( bcv [ axis ] , degree / 2 , poles , tolerance , 1 ) ) ; } // 'boost' is only applied to dimension 0, since it is meant to // affect the whole data set just once, not once per axis. vspecs [ 0 ] . boost = boost ; // KFJ 2018-05-08 with the automatic use of vectorization the // distinction whether math_ele_type is 'vectorizable' or not // is no longer needed: simdized_type will be a Vc::SimdArray // if possible, a vspline::simd_tv otherwise. typedef typename vspline::bspl_prefilter < vspline::simdized_type , math_ele_type , vsize > filter_type ; // now call the 'wielding' code in filter.h vspline::filter < in_value_type , out_value_type , dimension , filter_type > ( input , output , vspecs ) ; } } ; // namespace vspline #endif // VSPLINE_PREFILTER_H kfj-vspline-4b365417c271/thread_pool.h000066400000000000000000000145121333775006700173670ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /// \file thread_pool.h /// /// \brief provides a thread pool for vspline's multithread() routine /// /// class thread_pool aims to provide a simple and straightforward implementation /// of a thread pool for multithread() in multithread.h, but the class might find /// use elsewhere. The operation is simple, I think of it as 'piranha mode' ;) /// /// a set of worker threads is launched which wait for 'tasks', which come in the shape /// of std::function, from a queue. When woken, a worker thread tries to obtain /// a task. If it succeeds, the task is executed, and the worker thread tries to get /// another task. If none is to be had, it goes to sleep, waiting to be woken once /// there are new tasks. #include #include #include #include #include namespace vspline { class thread_pool { // used to switch off the worker threads at program termination. // access under task_mutex. bool stay_alive = true ; // the thread pool itself is held in this variable. The pool // does not change after construction std::vector < std::thread * > pool ; public: // mutex and condition variable for interaction with the task queue // and stay_alive std::mutex task_mutex ; std::condition_variable task_cv ; // queue to hold tasks. access under task_mutex std::queue < std::function < void() > > task_queue ; private: /// code to run a worker thread /// We use a thread pool of worker threads. These threads have a very /// simple cycle: They try and obtain a task (std::function). /// If there is one to be had, it is invoked, otherwise they wait on /// task_cv. When woken up, the flag stay_alive is checked, and if it /// is found to be false, the worker thread ends. void worker_thread() { while ( true ) { // under task_mutex, check stay_alive and try to obtain a task std::unique_lock task_lock ( task_mutex ) ; if ( ! stay_alive ) { task_lock.unlock() ; break ; // die } if ( task_queue.size() ) { // there are tasks in the queue, take one auto task = task_queue.front() ; task_queue.pop() ; task_lock.unlock() ; // got a task, perform it, then try for another one task() ; } else { // no luck. wait. task_cv.wait ( task_lock ) ; // simply wait, spurious alert is okay } // start next cycle, either after having completed a job // or after having been woken by an alert } } public: thread_pool ( int nthreads = 4 * std::thread::hardware_concurrency() ) { // to launch a thread with a method, we need to bind it to the object: std::function < void() > wf = std::bind ( &thread_pool::worker_thread , this ) ; // now we can fill the pool with worker threads for ( int t = 0 ; t < nthreads ; t++ ) pool.push_back ( new std::thread ( wf ) ) ; } int get_nthreads() const { return pool.size() ; } ~thread_pool() { { // under task_mutex, set stay_alive to false std::lock_guard task_lock ( task_mutex ) ; stay_alive = false ; } // wake all inactive worker threads, // join all worker threads once they are finished task_cv.notify_all() ; for ( auto threadp : pool ) { threadp->join() ; } // once all are joined, delete their std::thread object for ( auto threadp : pool ) { delete threadp ; } } } ; } ; // end of namespace vspline kfj-vspline-4b365417c271/transform.h000066400000000000000000001537731333775006700171170ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file transform.h \brief set of generic remap, transform and apply functions My foremost reason to have efficient B-spline processing is the formulation of generic remap-like functions. remap() is a function which takes an array of real-valued nD coordinates and an interpolator over a source array. Now each of the real-valued coordinates is fed into the interpolator in turn, yielding a value, which is placed in the output array at the same place the coordinate occupies in the coordinate array. To put it concisely, if we have - c, the coordinate array (or 'warp' array, 'arguments' array) - a, the source array (containing 'original' or 'knot point' data) - i, the interpolator over a - j, a coordinate into both c and t - t, the target array (receiving the 'result' of the remap) remap defines the operation t[j] = i(c[j]) for all j Now we widen the concept of remapping to a 'transform' function. Instead of limiting the process to the use of an 'interpolator', we use an arbitrary unary functor transforming incoming values to outgoing values, where the type of the incoming and outgoing values is determined by the functor. If the functor actually is an interpolator, we have a 'true' remap transforming coordinates into values, but this is merely a special case. So here we have: - c, an array containing input values - f, a unary functor converting input to output values - j, a coordinate into c and t - t, the target array transform performs the operation t[j] = f(c[j]) for all j remaps/transforms to other-dimensional objects are supported. This makes it possible to, for example, remap from a volume to a 2D image, using a 2D coordinate array containing 3D coordinates ('slicing' a volume) There is also a variant of this transform function in this file, which doesn't take an input array. Instead, for every target location, the location's discrete coordinates are passed to the unary_functor type object. This way, transformation-based remaps can be implemented easily: the user code just has to provide a suitable functor to yield values for coordinates. This functor will internally take the discrete incoming coordinates (into the target array) and take it from there, eventually producing values of the target array's value_type. Here we have: - f, a unary functor converting discrete coordinates to output values - j, a discrete coordinate into t - t, the target array 'index-based' transform performs the operation t[j] = f(j) for all j This file also has code to evaluate a b-spline at coordinates in a grid, which can be used for scaling, and for separable geometric transformations. Finally there is a function to restore the original data from a b-spline to the precision possible with the given data type and degree of the spline. This is done with a separable convolution, using a unit-stepped sampling of the basis function as the convolution kernel along every axis. Let me reiterate the strategy used to perform the transforms and remaps in this file. The approach is functional: A 'processing chain' is set up and encoded as a functor providing two evaluation functions: one for 'single' data and one for vectorized data. This functor is applied to the data by 'wielding' code, which partitions the data into several jobs to be performed by individual worker threads, invokes the worker threads, and, once in the worker thread, feeds the data to the functor in turn, using hardware vectorization if possible. So while at the user code level a single call to some 'transform' or 'remap' routine is issued, passing in arrays of data and functors, all the 'wielding' is done automatically without any need for the user to even be aware of it, while still using highly efficient vector code with a tread pool, potentially speeding up the operation greatly as compared to single-threaded, unvectorized operation, of course depending on the presence of several cores and vector units. On my system (Haswell 4-core i5 with AVX2) the speedup is about one order of magnitude. The only drawback is the production of a hardware-specific binary if vectorization is used. Both Vc use and multithreading can be easily activated/deactivated by #define switches, providing a clean fallback solution to produce code suitable for any target, even simple single-core machines with no vector units. Vectorization will be used if possible - either explicit vectorization (by defining USE_VC) - or autovectorization per default. Defining VSPLINE_SINGLETHREAD will disable multithreading. The code accessing multithreading and/or Vc use is #ifdeffed, so if these features are disabled, their code 'disappears' and the relevant headers are not included, nor do the corresponding libraries have to be present. */ // TODO: don't multithread or reduce number of jobs for small data sets #ifndef VSPLINE_TRANSFORM_H #define VSPLINE_TRANSFORM_H #include "multithread.h" // vspline's multithreading code #include "eval.h" // evaluation of b-splines #include "poles.h" #include "convolve.h" // If user code defines VSPLINE_DEFAULT_PARTITIONER, that's what used // as the default partitioner. Otherwise, we use partition_to_stripes // as default partitioner, which does a good job in most situations // and never a 'really' bad job - hopefully ;) #ifndef VSPLINE_DEFAULT_PARTITIONER #define VSPLINE_DEFAULT_PARTITIONER vspline::partition_to_stripes #endif // The bulk of the implementation of vspline's two 'transform' functions // is now in wielding.h: #include "wielding.h" namespace vspline { /// implementation of two-array transform using wielding::coupled_wield. /// /// 'array-based' transform takes two template arguments: /// /// - 'unary_functor_type', which is a class satisfying the interface /// laid down in unary_functor.h. Typically, this would be a type /// inheriting from vspline::unary_functor, but any type will do as /// long as it provides the required typedefs and an the relevant /// eval() routines. /// /// - the dimensionality of the input and output array /// /// this overload of transform takes three parameters: /// /// - a reference to a const unary_functor_type object providing the /// functionality needed to generate values from arguments. /// /// - a reference to a const MultiArrayView holding arguments to feed to /// the unary functor object. It has to have the same shape as the target /// array and contain data of the unary_functor's 'in_type'. /// /// - a reference to a MultiArrayView to use as a target. This is where the /// resulting data are put, so it has to contain data of unary_functor's /// 'out_type'. It has to have the same shape as the input array. /// /// transform can be used without template arguments, they will be inferred /// by ATD from the arguments. template < typename unary_functor_type , unsigned int dimension > void transform ( const unary_functor_type & functor , const vigra::MultiArrayView < dimension , typename unary_functor_type::in_type > & input , vigra::MultiArrayView < dimension , typename unary_functor_type::out_type > & output ) { // check shape compatibility if ( output.shape() != input.shape() ) { throw vspline::shape_mismatch ( "transform: the shapes of the input and output array do not match" ) ; } // set up a range covering the whole source/target array vspline::shape_type < dimension > begin ; vspline::shape_type < dimension > end = output.shape() ; vspline::shape_range_type < dimension > range ( begin , end ) ; // wrap the vspline::unary_functor to be used with wielding code. // The wrapper is necessary because the code in wielding.h feeds // arguments as TinyVectors, even if the data are 'singular'. // The wrapper simply reinterpret_casts any TinyVectors of one // element to their corresponding value_type before calling the // functor. In other words, to use my own terminology: 'canonical' // types are reinterpret_cast to 'synthetic' types. typedef wielding::vs_adapter < unary_functor_type > coupled_functor_type ; coupled_functor_type coupled_functor ( functor ) ; // we'll cast the pointers to the arrays to these types to be // compatible with the wrapped functor above. typedef typename coupled_functor_type::in_type src_type ; typedef typename coupled_functor_type::out_type trg_type ; typedef vigra::MultiArrayView < dimension , src_type > src_view_type ; typedef vigra::MultiArrayView < dimension , trg_type > trg_view_type ; // now delegate to the wielding code wielding::coupled_wield < coupled_functor_type , dimension > ( range , vspline::default_njobs , coupled_functor , (src_view_type*)(&input) , (trg_view_type*)(&output) ) ; } /// implementation of index-based transform using wielding::index_wield /// /// this overload of transform() is very similar to the first one, but /// instead of picking input from an array, it feeds the discrete coordinates /// of the successive places data should be rendered to to the /// unary_functor_type object. /// /// This sounds complicated, but is really quite simple. Let's assume you have /// a 2X3 output array to fill with data. When this array is passed to transform, /// the functor will be called with every coordinate pair in turn, and the result /// the functor produces is written to the output array. So for the example given, /// with 'ev' being the functor, we have this set of operations: /// /// output [ ( 0 , 0 ) ] = ev ( ( 0 , 0 ) ) ; /// /// output [ ( 1 , 0 ) ] = ev ( ( 1 , 0 ) ) ; /// /// output [ ( 2 , 0 ) ] = ev ( ( 2 , 0 ) ) ; /// /// output [ ( 0 , 1 ) ] = ev ( ( 0 , 1 ) ) ; /// /// output [ ( 1 , 1 ) ] = ev ( ( 1 , 1 ) ) ; /// /// output [ ( 2 , 1 ) ] = ev ( ( 2 , 1 ) ) ; /// /// this transform overload takes one template argument: /// /// - 'unary_functor_type', which is a class satisfying the interface laid /// down in unary_functor.h. This is an object which can provide values /// given *discrete* coordinates, like class evaluator, but generalized /// to allow for arbitrary ways of achieving it's goal. The unary functor's /// 'in_type' determines the number of dimensions of the coordinates - since /// they are coordinates into the target array, the functor's input type /// has to have the same number of dimensions as the target. The functor's /// 'out_type' has to be the same as the data type of the target array, since /// the target array stores the results of calling the functor. /// /// this transform overload takes two parameters: /// /// - a reference to a const unary_functor_type object providing the /// functionality needed to generate values from discrete coordinates /// /// - a reference to a MultiArrayView to use as a target. This is where the /// resulting data are put. /// /// Please note that vspline holds with vigra's coordinate handling convention, /// which puts the fastest-changing index first. In a 2D, image processing, /// context, this is the column index, or the x coordinate. C and C++ do /// instead put this index last when using multidimensional array access code. /// /// transform can be used without template arguments, they will be inferred /// by ATD from the arguments. template < class unary_functor_type > void transform ( const unary_functor_type & functor , vigra::MultiArrayView < unary_functor_type::dim_in , typename unary_functor_type::out_type > & output ) { enum { dimension = unary_functor_type::dim_in } ; // set up a range covering the whole target array vspline::shape_type < dimension > begin ; vspline::shape_type < dimension > end = output.shape() ; vspline::shape_range_type < dimension > range ( begin , end ) ; // wrap the vspline::unary_functor to be used with wielding code typedef wielding::vs_adapter < unary_functor_type > index_functor_type ; index_functor_type index_functor ( functor ) ; // we'll cast the pointer to the target array to this type to be // compatible with the wrapped functor above typedef typename index_functor_type::out_type trg_type ; typedef vigra::MultiArrayView < dimension , trg_type > trg_view_type ; // now delegate to the wielding code wielding::index_wield < index_functor_type , dimension > ( range , vspline::default_njobs , index_functor , (trg_view_type*)(&output) ) ; } /// we code 'apply' as a special variant of 'transform' where the output /// is also used as input, so the effect is to feed the unary functor /// each 'output' value in turn, let it process it and store the result /// back to the same location. While this looks like a rather roundabout /// way of performing an apply, it has the advantage of using the same /// type of functor (namely one with const input and writable output), /// rather than a different functor type which modifies it's argument /// in-place. While, at this level, using such a functor looks like a /// feasible idea, It would require specialized code 'further down the /// line' when complex functors are built with vspline's functional /// programming tools: the 'apply-capable' functors would need to read /// the output values first and write them back after anyway, resulting /// in the same sequence of loads and stores as we get with the current /// 'fake apply' implementation. template < typename unary_functor_type , // functor to apply unsigned int dimension > // input/output array's dimension void apply ( const unary_functor_type & ev , vigra::MultiArrayView < dimension , typename unary_functor_type::out_type > & output ) { // make sure the functor's input and output type are the same static_assert ( std::is_same < typename unary_functor_type::in_type , typename unary_functor_type::out_type > :: value , "apply: functor's input and output type must be the same" ) ; // delegate to transform transform ( ev , output , output ) ; } /// a type for a set of boundary condition codes, one per axis template < unsigned int dimension > using bcv_type = vigra::TinyVector < vspline::bc_code , dimension > ; /// Implementation of 'classic' remap, which directly takes an array of /// values and remaps it, internally creating a b-spline of given order /// just for the purpose. This is used for one-shot remaps where the spline /// isn't reused, and specific to b-splines, since the functor used is a /// b-spline evaluator. The spline defaults to a cubic b-spline with /// mirroring on the bounds. /// /// So here we have the 'classic' remap, where the input array holds /// coordinates and the functor used is actually an interpolator. Since /// this is merely a special case of using transform(), we delegate to /// transform() once we have the evaluator. /// /// The template arguments are chosen to allow the user to call 'remap' /// without template arguments; the template arguments can be found by ATD /// by looking at the MultiArrayViews passed in. /// /// - original_type is the value_type of the array holding the 'original' data over /// which the interpolation is to be performed /// /// - result_type is the value_type of the array taking the result of the remap, /// namely the values produced by the interpolation. these data must have as /// many channels as original_type /// /// - coordinate_type is the type for coordinates at which the interpolation is to /// be performed. coordinates must have as many components as the input array /// has dimensions. /// /// optionally, remap takes a set of boundary condition values and a spline /// degree, to allow creation of splines for specific use cases beyond the /// default. I refrain from extending the argument list further; user code with /// more specific requirements will have to create an evaluator and use 'transform'. /// /// Note that remap can be called without template arguments, the types will /// be inferred by ATD from the arguments passed in. template < typename original_type , // data type of original data typename result_type , // data type for interpolated data typename coordinate_type , // data type for coordinates unsigned int cf_dimension , // dimensionality of original data unsigned int trg_dimension , // dimensionality of result array int bcv_dimension = cf_dimension > // see below. g++ ATD needs this. void remap ( const vigra::MultiArrayView < cf_dimension , original_type > & input , const vigra::MultiArrayView < trg_dimension , coordinate_type > & coordinates , vigra::MultiArrayView < trg_dimension , result_type > & output , bcv_type < bcv_dimension > bcv = bcv_type < bcv_dimension > ( MIRROR ) , int degree = 3 ) { static_assert ( vigra::ExpandElementResult < original_type > :: size == vigra::ExpandElementResult < result_type > :: size , "input and output data type must have same nr. of channels" ) ; static_assert ( cf_dimension == vigra::ExpandElementResult < coordinate_type > :: size , "coordinate type must have the same dimension as input array" ) ; // this is silly, but when specifying bcv_type < cf_dimension >, the code failed // to compile with g++. So I use a separate template argument bcv_dimension // and static_assert it's the same as cf_dimension. TODO this sucks... static_assert ( cf_dimension == bcv_dimension , "boundary condition specification needs same dimension as input array" ) ; // check shape compatibility if ( output.shape() != coordinates.shape() ) { throw shape_mismatch ( "the shapes of the coordinate array and the output array must match" ) ; } // get a suitable type for the b-spline's coefficients typedef typename vigra::PromoteTraits < original_type , result_type > :: Promote _cf_type ; typedef typename vigra::NumericTraits < _cf_type > :: RealPromote cf_type ; // create the bspline object typedef typename vspline::bspline < cf_type , cf_dimension > spline_type ; spline_type bsp ( input.shape() , degree , bcv ) ; // prefilter, taking data in 'input' as knot point data bsp.prefilter ( input ) ; // since this is a commodity function, we use a 'safe' evaluator. // If maximum performance is needed and the coordinates are known to be // in range, user code should create a 'naked' vspline::evaluator and // use it with vspline::transform. // Note how we pass in 'rc_type', the elementary type of a coordinate. // We want to allow the user to pass float or double coordinates. typedef typename vigra::ExpandElementResult < coordinate_type > :: type rc_type ; auto ev = vspline::make_safe_evaluator < spline_type , rc_type > ( bsp ) ; // call transform(), passing in the evaluator, // the coordinate array and the target array transform ( ev , coordinates , output ) ; } // next we have code for evaluation of b-splines over grids of coordinates. // This code lends itself to some optimizations, since part of the weight // generation used in the evaluation process is redundant, and by // precalculating all redundant values and referring to the precalculated // values during the evaluation a good deal of time can be saved - provided // that the data involved a nD. // TODO: as in separable convolution, it might be profitable here to apply // weights for one axis to the entire array, then repeat with the other axes // in turn. storing, modifying and rereading the array may still // come out faster than the rather expensive DDA needed to produce the // value with weighting in all dimensions applied at once, as the code // below does (by simply applying the weights in the innermost eval // class evaluator offers). The potential gain ought to increase with // the dimensionality of the data. // for evaluation over grids of coordinates, we use a vigra::TinyVector // of 1D MultiArrayViews holding the component coordinates for each // axis. // When default-constructed, this object holds default-constructed // MultiArrayViews, which, when assigned another MultiArrayView, // will hold another view over the data, rather than copying them. // initially I was using a small array of pointers for the purpose, // but that is potentially unsafe and does not allow passing strided // data. template < unsigned int dimension , typename rc_ele_type = float > using grid_spec = vigra::TinyVector < vigra::MultiArrayView < 1 , rc_ele_type > , dimension > ; namespace detail // workhorse code for grid_eval { // in grid_weight, for every dimension we have a set of spline_order // weights for every position in this dimension. in grid_ofs, we have the // partial offset for this dimension for every position. these partial // offsets are the product of the index for this dimension at the position // and the stride for this dimension, so that the sum of the partial // offsets for all dimensions yields the offset into the coefficient array // to the window of coefficients where the weights are to be applied. // First we have code for 'level' > 0. _grid_eval uses a recursive // descent through the dimensions, starting with the highest one and // working it's way down to 'level 0', the x axis. For level > 0, // the vectorized and unvectorized code is the same: template < typename evaluator_type , int level , size_t _vsize = 0 > struct _grid_eval { // glean the data type for results and for MultiArrayView of // results - this is where the result of the operation goes typedef typename evaluator_type::trg_type trg_type ; typedef vigra::MultiArrayView < level + 1 , trg_type > target_view_type ; // get the type of a 'weight', a factor to apply to a coefficient. // we obtain this from the evaluator's 'inner_type', which has the // actual evaluation code, while class evaluator merely interfaces // to it. typedef typename evaluator_type::inner_type iev_type ; typedef typename iev_type::math_ele_type weight_type ; void operator() ( int initial_ofs , vigra::MultiArrayView < 2 , weight_type > & weight , weight_type** const & grid_weight , const int & spline_order , int ** const & grid_ofs , const evaluator_type & itp , target_view_type & result ) { // iterating along the axis 'level', we fix a coordinate 'c' // to every possible value in turn for ( int c = 0 ; c < result.shape ( level ) ; c++ ) { // we pick the set of weights corresponding to the axis 'level' // from 'grid_weight' for this level for ( int e = 0 ; e < spline_order ; e++ ) { weight [ vigra::Shape2 ( e , level ) ] = grid_weight [ level ] [ spline_order * c + e ] ; } // cum_ofs, the cumulated offset, is the sum of the partial // offsets for all levels. here we add the contribution picked // from 'grid_ofs' for this level, at coordinate c int cum_ofs = initial_ofs + grid_ofs [ level ] [ c ] ; // for the recursive descent, we create a subdimensional slice // of 'result', fixing the coordinate for axis 'level' at c: auto region = result.bindAt ( level , c ) ; // now we call _grid_eval recursively for the next-lower level, // passing on the arguments, which have received the current level's // additions, and the slice we've created in 'region'. _grid_eval < evaluator_type , level - 1 , _vsize >() ( cum_ofs , weight , grid_weight , spline_order , grid_ofs , itp , region ) ; } } } ; /// At level 0 the recursion ends. 'result' is now a 1D MultiArrayView, /// which is easy to process. Here we perform the actual evaluation. /// With template argument _vsize unfixed, we have the vector code, /// below is a specialization for _vsize == 1 which is unvectorized. template < typename evaluator_type , size_t _vsize > struct _grid_eval < evaluator_type , 0 , _vsize > { enum { vsize = evaluator_type::vsize } ; enum { channels = evaluator_type::channels } ; typedef typename evaluator_type::math_ele_type weight_type ; typedef typename vspline::vector_traits < weight_type , vsize > :: ele_v math_ele_v ; typedef typename evaluator_type::trg_ele_type trg_ele_type ; typedef typename evaluator_type::trg_type trg_type ; typedef typename vspline::vector_traits < trg_type , vsize > :: nd_ele_v trg_v ; typedef vigra::MultiArrayView < 1 , trg_type > target_view_type ; typedef typename evaluator_type::inner_type iev_type ; typedef typename iev_type::ofs_ele_type ofs_ele_type ; typedef typename vspline::vector_traits < ofs_ele_type , vsize > :: ele_v ofs_ele_v ; typedef typename iev_type::trg_type trg_syn_type ; typedef vigra::MultiArrayView < 1 , trg_syn_type > target_syn_view_type ; void operator() ( int initial_ofs , vigra::MultiArrayView < 2 , weight_type > & weight , weight_type** const & grid_weight , const int & spline_order , int ** const & grid_ofs , const evaluator_type & itp , target_view_type & _region ) { // we'll be using the evaluator's 'inner evaluator' for the actual // evaluation, which operates on 'synthetic' types. Hence this cast: auto & region = reinterpret_cast < target_syn_view_type & > ( _region ) ; // number of vectorized results int aggregates = region.size() / vsize ; // have storage ready for vectorized weights using allocator_t = typename vspline::allocator_traits < math_ele_v > :: type ; vigra::MultiArray < 2 , math_ele_v , allocator_t > vweight ( weight.shape() ) ; // ditto, for vectorized offsets ofs_ele_v select ; // and a buffer for vectorized target data trg_v vtarget ; // initialize the vectorized weights for dimensions > 0 // These remain constant throughout this routine, since the recursive // descent has successively fixed them (with the bindAt operation), // and in the course of the recursive descent they were deposited in // 'weight', where we now pick them up. Note how 'weight' holds these // values as fundamentals (weight_type), and now they are broadcast to // a vector type (math_ele_v), containing vsize identical copies. for ( int d = 1 ; d < weight.shape(1) ; d++ ) { for ( int o = 0 ; o < spline_order ; o++ ) { vweight [ vigra::Shape2 ( o , d ) ] = weight [ vigra::Shape2 ( o , d ) ] ; } } // get a pointer to the target array's data (seen as elementary type) trg_ele_type * p_target = (trg_ele_type*) ( region.data() ) ; // and the stride, if any, also in terms of the elementary type, from // one cluster of target data to the next int stride = vsize * channels * region.stride(0) ; // calculate scatter indexes for depositing result data const auto indexes = ofs_ele_v::IndexesFromZero() * channels * region.stride(0) ; // now the peeling run, processing vectorized data as long as we // have full vectors to process for ( int a = 0 ; a < aggregates ; a++ ) { // gather the individual weights into the vectorized form. // this operation gathers the level-0 weights, which are the only // part of the weights which vary throughout this routine, while the // higher-level weights have been fixed above. for ( int o = 0 ; o < spline_order ; o++ ) { vweight[ vigra::Shape2 ( o , 0 ) ].gather ( grid_weight [ 0 ] + spline_order * a * vsize , spline_order * ofs_ele_v::IndexesFromZero() + o ) ; } // get a set of vsize offsets from grid_ofs select.load ( grid_ofs [ 0 ] + a * vsize ) ; // add cumulated offsets from higher dimensions select += initial_ofs ; // now we can call the vectorized eval routine of evaluator's // 'inner' object. itp.inner.eval ( select , vweight , vtarget ) ; // finally we scatter the vectorized result to target memory for ( int e = 0 ; e < channels ; e++ ) vtarget[e].scatter ( p_target + e , indexes ) ; // and set p_target to the next cluster of target values p_target += stride ; } // the first position unaffected by the peeling run is here: int c0 = aggregates * vsize ; // create an iterator into target array pointing to this position // in the target array auto iter = region.begin() + c0 ; // now we finish off the stragglers, which is essentially the // same code as in the unvectorized specialization below. for ( int c = c0 ; c < region.shape ( 0 ) ; c++ ) { // pick up the level-0 weights at this coordinate for ( int e = 0 ; e < spline_order ; e++ ) { weight [ vigra::Shape2 ( e , 0 ) ] = grid_weight [ 0 ] [ spline_order * c + e ] ; } // add the last summand to the cumulated offset int cum_ofs = initial_ofs + grid_ofs [ 0 ] [ c ] ; // now we have everything together and can evaluate. itp.inner.eval ( cum_ofs , weight , *iter ) ; ++iter ; } } } ; /// unvectorized specialization of _grid_eval at level 0. This is, /// essentially, the vectorized code above minus the peeling run. template < typename evaluator_type > struct _grid_eval < evaluator_type , 0 , 1 > { typedef typename evaluator_type::math_ele_type weight_type ; typedef typename evaluator_type::trg_ele_type trg_ele_type ; typedef typename evaluator_type::trg_type trg_type ; typedef vigra::MultiArrayView < 1 , trg_type > target_view_type ; typedef typename evaluator_type::inner_type iev_type ; enum { channels = evaluator_type::channels } ; typedef vigra::TinyVector < trg_ele_type , channels > trg_syn_type ; typedef vigra::MultiArrayView < 1 , trg_syn_type > target_syn_view_type ; void operator() ( int initial_ofs , vigra::MultiArrayView < 2 , weight_type > & weight , weight_type** const & grid_weight , const int & spline_order , int ** const & grid_ofs , const evaluator_type & itp , target_view_type & _region ) { auto & region = reinterpret_cast < target_syn_view_type & > ( _region ) ; auto iter = region.begin() ; for ( int c = 0 ; c < region.shape ( 0 ) ; c++ ) { for ( int e = 0 ; e < spline_order ; e++ ) { weight [ vigra::Shape2 ( e , 0 ) ] = grid_weight [ 0 ] [ spline_order * c + e ] ; } int cum_ofs = initial_ofs + grid_ofs [ 0 ] [ c ] ; itp.inner.eval ( cum_ofs , weight , *iter ) ; ++iter ; } } } ; /// Here is the single-threaded code for the grid_eval function. /// The first argument is a shape range, defining the subsets of data /// to process in a single thread. the remainder are forwards of the /// arguments to grid_eval, as pointers. The call is affected via /// 'multithread()' which sets up the partitioning and distribution /// to threads from a thread pool. template < typename evaluator_type > // b-spline evaluator type void st_grid_eval ( shape_range_type < evaluator_type::dim_in > range , grid_spec < evaluator_type::dim_in , typename evaluator_type::rc_ele_type > * p_grid , const evaluator_type * itp , vigra::MultiArrayView < evaluator_type::dim_in , typename evaluator_type::trg_type > * p_result ) { enum { dimension = evaluator_type::dim_in } ; typedef typename evaluator_type::math_ele_type weight_type ; typedef typename evaluator_type::rc_ele_type rc_type ; typedef vigra::MultiArrayView < dimension , typename evaluator_type::trg_type > target_type ; const int spline_order = itp->inner.get_order() ; // pick the subarray of the 'whole' target array pertaining // to this thread's range auto result = p_result->subarray ( range[0] , range[1] ) ; // pick the subset of coordinates pertaining to this thread's // range grid_spec < evaluator_type::dim_in , typename evaluator_type::rc_ele_type > r_grid ; auto & grid ( *p_grid ) ; // 'subarray' for 1D vigra::MultiArrayView can't take plain 'long', // so we have to package the limits of the range in TinyVectors vigra::TinyVector < typename evaluator_type::rc_ele_type , 1 > _begin ; vigra::TinyVector < typename evaluator_type::rc_ele_type , 1 > _end ; for ( int d = 0 ; d < dimension ; d++ ) { // would like to do this: // r_grid[d] = grid[d].subarray ( range[0][d] , range[1][d] ) ; _begin[0] = range[0][d] ; _end[0] = range[1][d] ; r_grid[d] = grid[d].subarray ( _begin , _end ) ; } // set up storage for precalculated weights and offsets weight_type * grid_weight [ dimension ] ; int * grid_ofs [ dimension ] ; // get some metrics TinyVector < int , dimension > shape ( result.shape() ) ; TinyVector < int , dimension > estride ( itp->inner.get_estride() ) ; // allocate space for the per-axis weights and offsets for ( int d = 0 ; d < dimension ; d++ ) { grid_weight[d] = new weight_type [ spline_order * shape [ d ] ] ; grid_ofs[d] = new int [ shape [ d ] ] ; } int select ; rc_type tune ; // fill in the weights and offsets, using the interpolator's // split() to split the coordinates received in grid_coordinate, // the interpolator's obtain_weights() method to produce the // weight components, and the strides of the coefficient array // to convert the integral parts of the coordinates into offsets. for ( int d = 0 ; d < dimension ; d++ ) { for ( int c = 0 ; c < shape [ d ] ; c++ ) { itp->inner.split ( r_grid [ d ] [ c ] , select , tune ) ; itp->inner.obtain_weights ( grid_weight [ d ] + spline_order * c , d , tune ) ; grid_ofs [ d ] [ c ] = select * estride [ d ] ; } } // allocate storage for a set of singular weights using allocator_t = typename vspline::allocator_traits < weight_type > :: type ; vigra::MultiArray < 2 , weight_type , allocator_t > weight ( vigra::Shape2 ( spline_order , dimension ) ) ; // now call the recursive workhorse routine detail::_grid_eval < evaluator_type , dimension - 1 , evaluator_type::vsize >() ( 0 , weight , grid_weight , spline_order , grid_ofs , *itp , result ) ; // clean up for ( int d = 0 ; d < dimension ; d++ ) { delete[] grid_weight[d] ; delete[] grid_ofs[d] ; } } } ; // end of namespace detail /// this is the multithreaded version of grid_eval, which sets up the /// full range over 'result' and calls 'multithread' to do the rest /// /// grid_eval evaluates a b-spline object /// at points whose coordinates are distributed in a grid, so that for /// every axis there is a set of as many coordinates as this axis is long, /// which will be used in the grid as the coordinate for this axis at the /// corresponding position. The resulting coordinate matrix (which remains /// implicit) is like a grid made from the per-axis coordinates. Note how /// these coordinates needn't be evenly spaced, or in any specific order. /// Evenly spaced coordinates would hold the potential for even further /// optimization, specifically when decimation or up/downsampling are /// needed. So far I haven't coded for these special cases, and since the /// bulk of the processing time here is used up for memory access (to load /// the relevant coefficients), and for the arithmetic to apply the weights /// to the coefficients, the extra performance gain would be moderate. /// /// If we have two dimensions and x coordinates x0, x1 and x2, and y /// coordinates y0 and y1, the resulting implicit coordinate matrix is /// /// (x0,y0) (x1,y0) (x2,y0) /// /// (x0,y1) (x1,y1) (x2,y1) /// /// since the offsets and weights needed to perform an interpolation /// only depend on the coordinates, this highly redundant coordinate array /// can be processed more efficiently by precalculating the offset component /// and weight component for all axes and then simply permutating them to /// obtain the result. Especially for higher-degree and higher-dimensional /// splines this saves quite some time, since the generation of weights /// is computationally expensive. /// /// grid_eval is useful for generating a scaled representation of the original /// data, but when scaling down, aliasing will occur and the data should be /// low-pass-filtered adequately before processing. /// /// Note that this code is specific to b-spline evaluators and relies /// on evaluator_type offering several b-spline specific methods which /// are not present in other interpolators, like split() and /// obtain_weights(). Since the weight generation for b-splines can /// be done separately for each axis and is a computationally intensive /// task, precalculating these per-axis weights makes sense. Coding for /// the general case (other unary functors), the only achievement is /// the permutation of the partial coordinates, so little is gained, /// and a transform where the indices are used to pick up /// the coordinates can be written easily: have a unary_functor taking /// discrete coordinates, 'loaded' with the per-axis coordinates, and an /// eval routine using the picked coordinates. This scheme is implemented /// further below, in class grid_eval_functor and gen_grid_eval() using it. template < typename evaluator_type > void grid_eval ( grid_spec < evaluator_type::dim_in , typename evaluator_type::in_ele_type > & grid , const evaluator_type & itp , vigra::MultiArrayView < evaluator_type::dim_in , typename evaluator_type::trg_type > & result ) { enum { dimension = evaluator_type::dim_in } ; // make sure the grid specification has enough coordinates for ( int d = 0 ; d < dimension ; d++ ) assert ( grid[d].size() >= result.shape ( d ) ) ; shape_range_type < dimension > range ( shape_type < dimension > () , result.shape() ) ; multithread ( detail::st_grid_eval < evaluator_type > , VSPLINE_DEFAULT_PARTITIONER < dimension > , ncores * 8 , range , &grid , &itp , &result ) ; } /// generalized grid evaluation. While grid_eval above specifically uses /// b-spline evaluation, saving time by precalculating weights and offsets, /// generalized grid evaluation can use any vspline::unary_functor on the /// grid positions. If this functor happens to be a b-spline evaluator, the /// result will be the same as the result obtained from using grid_eval, /// but the calculation takes longer. /// /// The implementation is simple: we wrap the 'inner' functor providing /// evaluation at a grid location in an outer functor 'grid_eval_functor', /// which receives discrete coordinates, picks the corresponding grid /// coordinates and delegates to the inner functor to obtain the result. /// The outer functor is then used with an index-based transform to fill /// the target array. /// /// This is a good example for the use of functional programming in vspline, /// as it demonstrates wrapping of one functor in another and the use of /// the combined functor with vspline::transform. It's also nice to have, /// since it offers a straightforward equivalent implementation of grid_eval /// to doublecheck the grid_eval functions correctly, as we do in scope_test. template < typename _inner_type , typename ic_type = int > struct grid_eval_functor : public vspline::unary_functor < typename vspline::canonical_type < ic_type , _inner_type::dim_in > , typename _inner_type::out_type , _inner_type::vsize > { typedef _inner_type inner_type ; enum { vsize = inner_type::vsize } ; enum { dimension = inner_type::dim_in } ; typedef typename vspline::canonical_type < ic_type , dimension > in_type ; typedef typename inner_type::out_type out_type ; typedef typename inner_type::in_ele_type rc_type ; typedef typename vspline::unary_functor < in_type , out_type , vsize > base_type ; static_assert ( std::is_integral < ic_type > :: value , "grid_eval_functor: must use integral coordinates" ) ; const inner_type inner ; typedef grid_spec < inner_type::dim_in , typename inner_type::in_ele_type > grid_spec_t ; grid_spec_t grid ; grid_eval_functor ( grid_spec_t & _grid , const inner_type & _inner ) : grid ( _grid ) , inner ( _inner ) { } ; void eval ( const in_type & c , out_type & result ) const { typename inner_type::in_type cc ; // for uniform access, we use reinterpretations of the coordinates // as nD types, even if they are only 1D. This is only used to // fill in 'cc', the cordinate to be fed to 'inner'. typedef typename base_type::in_nd_ele_type nd_ic_type ; typedef typename inner_type::in_nd_ele_type nd_rc_type ; const nd_ic_type & nd_c ( reinterpret_cast < const nd_ic_type & > ( c ) ) ; nd_rc_type & nd_cc ( reinterpret_cast < nd_rc_type & > ( cc ) ) ; for ( int d = 0 ; d < dimension ; d++ ) nd_cc [ d ] = grid [ d ] [ nd_c[d] ] ; inner.eval ( cc , result ) ; } template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const typename base_type::in_v & c , typename base_type::out_v & result ) const { typename inner_type::in_v cc ; typedef typename base_type::in_nd_ele_v nd_ic_v ; typedef typename inner_type::in_nd_ele_v nd_rc_v ; const nd_ic_v & nd_c ( reinterpret_cast < const nd_ic_v & > ( c ) ) ; nd_rc_v & nd_cc ( reinterpret_cast < nd_rc_v & > ( cc ) ) ; // TODO: we might optimize in two ways: // if the grid data are contiguous, we can issue a gather, // and if the coordinates above dimension 0 are equal for all e, // we can assign a scalar to nd_cc[d] for d > 0. for ( int d = 0 ; d < dimension ; d++ ) for ( int e = 0 ; e < vsize ; e++ ) nd_cc[d][e] = grid[d][ nd_c[d][e] ] ; inner.eval ( cc , result ) ; } } ; /// generalized grid evaluation. The production of result values from /// input values is done by an instance of grid_eval_functor, see above. /// The template argument, ev_type, has to be a functor (usually this /// will be a vspline::unary_functor). If the functor's in_type has /// dim_in components, grid_spec must also point to dim_in pointers, /// since ev's input is put together by picking a value from each /// of the arrays grid_spec points to. The result obviously has to have /// as many dimensions. template < typename ev_type > void gen_grid_eval ( grid_spec < ev_type::dim_in , typename ev_type::in_ele_type > & grid , const ev_type & ev , vigra::MultiArrayView < ev_type::dim_in , typename ev_type::out_type > & result ) { // make sure the grid specification has enough coordinates for ( int d = 0 ; d < ev_type::dim_in ; d++ ) assert ( grid[d].size() >= result.shape ( d ) ) ; // set up the grid evaluation functor and use it with 'transform' grid_eval_functor < ev_type > gev ( grid , ev ) ; vspline::transform ( gev , result ) ; } /// deprecated previous version taking the grid specification as /// a pointer to pointers. These will go with the 0.4.x series template < typename ev_type > void grid_eval ( typename ev_type::in_ele_type ** const p_grid_spec , const ev_type & ev , vigra::MultiArrayView < ev_type::dim_in , typename ev_type::out_type > & result ) { typedef typename ev_type::in_ele_type rc_type ; vspline::grid_spec < ev_type::dim_in , rc_type > grid_spec ; for ( int i = 0 ; i < ev_type::dim_in ; i++ ) { vigra::TinyVector < std::ptrdiff_t , 1 > sz ( result.shape(i) ) ; grid_spec[i] = vigra::MultiArrayView < 1 , rc_type > ( sz , p_grid_spec[i] ) ; } grid_eval ( grid_spec , ev , result ) ; } /// deprecated previous version taking the grid specification as /// a pointer to pointers. These will go with the 0.4.x series template < typename ev_type > void gen_grid_eval ( typename ev_type::in_ele_type ** const p_grid_spec , const ev_type & ev , vigra::MultiArrayView < ev_type::dim_in , typename ev_type::out_type > & result ) { typedef typename ev_type::in_ele_type rc_type ; vspline::grid_spec < ev_type::dim_in , rc_type > grid_spec ; for ( int i = 0 ; i < ev_type::dim_in ; i++ ) { vigra::TinyVector < std::ptrdiff_t , 1 > sz ( result.shape(i) ) ; grid_spec[i] = vigra::MultiArrayView < 1 , rc_type > ( sz , p_grid_spec[i] ) ; } gen_grid_eval ( grid_spec , ev , result ) ; } /// restore restores the original data from the b-spline coefficients. /// This is done efficiently using a separable convolution, the kernel /// is simply a unit-spaced sampling of the basis function. /// Since the filter uses internal buffering, using this routine /// in-place is safe - meaning that 'target' may be bspl.core itself. /// math_type, the data type for performing the actual maths on the /// buffered data, and the type the data are converted to when they /// are placed into the buffer, can be chosen, but I could not detect /// any real benefits from using anything but the default, which is to /// leave the data in their 'native' type. /// /// an alternative way to restore is running an index-based /// transform with an evaluator for the spline. This is much /// less efficient, but the effect is the same: /// /// auto ev = vspline::make_evaluator ( bspl ) ; /// vspline::transform ( ev , target ) ; /// /// Note that vsize, the vectorization width, can be passed explicitly. /// If Vc is in use and math_ele_type can be used with hardware /// vectorization, the arithmetic will be done with Vc::SimdArrays /// of the given size. Otherwise 'goading' will be used: the data are /// presented in TinyVectors of vsize math_ele_type, hoping that the /// compiler may autovectorize the operation. /// /// 'math_ele_type', the type used for arithmetic inside the filter, /// defaults to the vigra RealPromote type of value_type's elementary. /// This ensures appropriate treatment of integral-valued splines. // TODO hardcoded default vsize template < unsigned int dimension , typename value_type , typename math_ele_type = typename vigra::NumericTraits < typename vigra::ExpandElementResult < value_type > :: type > :: RealPromote , size_t vsize = vspline::vector_traits::size > void restore ( const vspline::bspline < value_type , dimension > & bspl , vigra::MultiArrayView < dimension , value_type > & target ) { if ( target.shape() != bspl.core.shape() ) throw shape_mismatch ( "restore: spline's core shape and target array shape must match" ) ; if ( bspl.spline_degree < 2 ) { // we can handle the degree 0 and 1 cases very efficiently, // since we needn't apply a filter at all. This is an // optimization, the filter code would still perform // correctly without it. if ( (void*) ( bspl.core.data() ) != (void*) ( target.data() ) ) { // operation is not in-place, copy data to target target = bspl.core ; } return ; } // first assemble the arguments for the filter int degree = bspl.spline_degree ; int headroom = degree / 2 ; int ksize = headroom * 2 + 1 ; xlf_type kernel [ ksize ] ; // pick the precomputed basis function values for the kernel. // Note how the values in precomputed_basis_function_values // (see poles.h) are provided at half-unit steps, hence the // index acrobatics. for ( int k = - headroom ; k <= headroom ; k++ ) { int pick = 2 * std::abs ( k ) ; kernel [ k + headroom ] = vspline_constants ::precomputed_basis_function_values [ degree ] [ pick ] ; } // the arguments have to be passed one per axis. While most // arguments are the same throughout, the boundary conditions // may be different for each axis. std::vector < vspline::fir_filter_specs > vspecs ; for ( int axis = 0 ; axis < dimension ; axis++ ) { vspecs.push_back ( vspline::fir_filter_specs ( bspl.bcv [ axis ] , ksize , headroom , kernel ) ) ; } // KFJ 2018-05-08 with the automatic use of vectorization the // distinction whether math_ele_type is 'vectorizable' or not // is no longer needed: simdized_type will be a Vc::SimdArray // if possible, a vspline::simd_tv otherwise. typedef typename vspline::convolve < vspline::simdized_type , math_ele_type , vsize > filter_type ; // now we have the filter's type, create an instance and // use it to affect the restoration of the original data. vspline::filter < value_type , value_type , dimension , filter_type > ( bspl.core , target , vspecs ) ; } /// overload of 'restore' writing the result of the operation back to /// the array which is passed in. This looks like an in-place operation, /// but the data are in fact moved to a buffer stripe by stripe, then /// the arithmetic is done on the buffer and finally the buffer is /// written back. This is repeated for each dimension of the array. template < int dimension , typename value_type , typename math_ele_type = typename vigra::NumericTraits < typename vigra::ExpandElementResult < value_type > :: type > :: RealPromote , size_t vsize = vspline::vector_traits::size > void restore ( vspline::bspline < value_type , dimension > & bspl ) { restore < dimension , value_type , math_ele_type , vsize > ( bspl , bspl.core ) ; } } ; // end of namespace vspline #endif // VSPLINE_TRANSFORM_H kfj-vspline-4b365417c271/unary_functor.h000066400000000000000000001152411333775006700177660ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file unary_functor.h \brief interface definition for unary functors vspline's evaluation and remapping code relies on a unary functor template which is used as the base for vspline::evaluator and also constitutes the type of object accepted by most of the functions in transform.h. This template produces functors which are meant to yield a single output for a single input, where both the input and output types may be single types or vigra::TinyVectors, and their elementary types may be vectorized. The functors are expected to provide methods named eval() which are capable of performing the required functionality. These eval routines take both their input and output by reference - the input is taken by const &, and the output as plain &. The result type of the eval routines is void. While such unary functors can be hand-coded, the class template 'unary_functor' provides services to create such functors in a uniform way, with a specifc system of associated types and some convenience code. Using unary_functor is meant to facilitate the creation of the unary functors used in vspline. Using unary_functor generates objects which can be easily combined into more complex unary functors, a typical use would be to 'chain' two unary_functors, see class template 'chain_type' below, which also provides an example for the use of unary_functor. class unary_functor takes three template arguments: - the argument type, IN - the result type, OUT - the number of fundamentals (float, int etc.) in a vector, _vsize The vectorized argument and result type are deduced from IN, OUT and _vsize by querying vspline::vector_traits. When using Vc (-DUSE_VC), these types will be Vc::SimdArrays if the elementary type can be used to form a SimdArray. Otherwise vspline provides a fallback type emulating vectorization: vspline::simd_tv. This fallback type emulates just enough of SimdArray's capabilities to function as a replacement inside vspline's body of code. So where is eval() or operator()? Not in class unary_functor. The actual functionality is provided by the derived class. There is deliberately no code concerning evaluation in class unary_functor. My initial implementation had pure virtual functions to define the interface for evaluation, but this required explicitly providing the overloads in the derived class. Simply omitting any reference to evaluation allows the derived class to accomplish evaluation with a template if the code is syntactically the same for vectorized and unvectorized operation. To users of concrete functors inheriting from unary_functor this makes no difference. The only drawback is that it's not possible to perform evaluation via a base class pointer or reference. But this is best avoided anyway because it degrades performance. If the need arises to have several unary_functors with the same template signature share a common type, there's a mechanism to make the internals opaque by 'grokking'. grokking provides a wrapper around a unary_functor which hides it's type, vspline::grok_type directly inherits from unary_functor and the only template arguments are IN, OUT and _vsize. This hurts performance a little - just as calling via a base class pointer/reference would, but the code is outside class unary_functor and therefore only activated when needed. Class vspline::evaluator is itself coded as a vspline::unary_functor and can serve as another example for the use of the code in this file. Before the introduction of vspline::simd_tv, vectorization was done with Vc or not at all. Now vspline::vector_traits will produce Vc types if possible and vspline::simd_tv otherwise. This breaks code relying on the fallback to scalar without Vc, and it also breaks code that assumes that Vc is the sole method of vectorization. Extant code written for use with Vc should function as before as long as USE_VC is defined. It may be possible now to use such code even without Vc. This depends on how much of Vc::SimdArray's functionality is used. If such code runs without Vc, it may still not perform well and possibly even worse than scalar code. */ #ifndef VSPLINE_UNARY_FUNCTOR_H #define VSPLINE_UNARY_FUNCTOR_H #include #include "common.h" namespace vspline { /// we derive all vspline::unary_functors from this empty class, to have /// a common base type for all of them. This enables us to easily check if /// a type is a vspline::unary_functor without having to wrangle with /// unary_functor's template arguments. template < size_t _vsize > struct unary_functor_tag { } ; /// class unary_functor provides a functor object which offers a system /// of types for concrete unary functors derived from it. If vectorization /// isn't used, this is trivial, but with vectorization in use, we get /// vectorized types derived from plain IN and OUT via query of /// vspline::vector_traits. /// /// class unary_functor itself does not provide operator(), this is left to /// the concrete functors inheriting from unary_functor. It is expected /// that the derived classes provide evaluation capability, either as a /// method template or as (overloaded) method(s) 'eval'. eval is to be coded /// as taking it's first argument as a const&, and writing it's result to /// it's second argument, which it receives by reference. eval's return type /// is void. Inside vspline, classes derived from unary_functor do provide /// operator(), so instances of these objects can be called with function /// call syntax as well. /// /// Why not lay down an interface with a pure virtual function eval() /// which derived classes would need to override? Suppose you had, in /// unary_functor, /// /// virtual void eval ( const in_type & in , out_type & out ) = 0 ; /// /// Then, in a derived class, you'd have to provide an override with this /// signature. Initially, this seems reasonable enough, but if you want to /// implement eval() as a member function template in the derived class, you /// still would have to provide the override (calling an instantiated version /// of your template), because your template won't be recognized as a viable /// way to override the pure virtual base class member function. Since /// providing eval as a template is common (oftentimes vectorized and /// unvectorized code are the same) I've decided against having virtual eval /// routines, to avoid the need for explicitly overriding them in derived /// classes which provide eval() as a template. /// /// How about providing operator() in unary_functor? We might add the derived /// class to the template argument list and use unary_functor with CRP. I've /// decideded against this and instead provide callability as a mixin to be /// used as needed. This keeps the complexity of unary_functor-derived objects /// low, adding the extra capability only where it's deemed appropriate. For /// the mixin, see class 'callable' further down. /// /// With no virtual member functions, class unary_functor becomes very simple, /// which is desirable from a design standpoint, and also makes unary_functors /// smaller, avoiding the creation of the virtual function table. /// /// The type system used in unary_functor is taken from vspline::vector_traits, /// additionally prefixing the types with in_ and out_, for input and output /// types. The other elements of the type names are the same as in /// vector_traits. template < typename IN , // argument or input type typename OUT , // result type size_t _vsize = vspline::vector_traits < IN > :: size > struct unary_functor : public unary_functor_tag < _vsize > { // number of fundamentals in simdized data. If vsize is 1, the vectorized // types will 'collapse' to the unvectorized types. enum { vsize = _vsize } ; // number of dimensions. This may well be different for IN and OUT. enum { dim_in = vspline::vector_traits < IN > :: dimension } ; enum { dim_out = vspline::vector_traits < OUT > :: dimension } ; // typedefs for incoming (argument) and outgoing (result) type. These two types // are non-vectorized types, like vigra::TinyVector < float , 2 >. Since such types // consist of elements of the same type, the corresponding vectorized type can be // easily automatically determined. typedef IN in_type ; typedef OUT out_type ; // elementary types of same. we rely on vspline::vector_traits to provide // these types. typedef typename vspline::vector_traits < IN > :: ele_type in_ele_type ; typedef typename vspline::vector_traits < OUT > :: ele_type out_ele_type ; // 'synthetic' types for input and output. These are always TinyVectors, // possibly of only one element, of the elementary type of in_type/out_type. // On top of providing a uniform container type (the TinyVector) the // synthetic type is also 'unaware' of any specific meaning the 'true' // input/output type may have, and arithmetic operations on the synthetic // types won't clash with arithmetic defined for the 'true' types. typedef vigra::TinyVector < in_ele_type , dim_in > in_nd_ele_type ; typedef vigra::TinyVector < out_ele_type , dim_out > out_nd_ele_type ; // for vectorized operation, we need a few extra typedefs. I use a _v // suffix instead of the _type suffix above to indicate vectorized types. // If vsize is 1, the _v types simply collapse to the unvectorized // types, having them does no harm, but it's not safe to assume that, // for example, in_v and in_type are in fact different types. /// a simdized type of the elementary type of result_type, /// which is used for coefficients and results. this is fixed via /// the traits class vector_traits (in vector.h). Note how we derive /// this type using vsize from the template argument, not what /// vspline::vector_traits deems appropriate for ele_type - though /// both numbers will be the same in most cases. typedef typename vector_traits < IN , vsize > :: ele_v in_ele_v ; typedef typename vector_traits < OUT , vsize > :: ele_v out_ele_v ; // 'synthetic' types for simdized input and output. These are always // TinyVectors, possibly of only one element, of the simdized input // and output type. typedef typename vector_traits < IN , vsize > :: nd_ele_v in_nd_ele_v ; typedef typename vector_traits < OUT , vsize > :: nd_ele_v out_nd_ele_v ; /// vectorized in_type and out_type. vspline::vector_traits supplies these /// types so that multidimensional/multichannel data come as vigra::TinyVectors, /// while 'singular' data won't be made into TinyVectors of one element. typedef typename vector_traits < IN , vsize > :: type in_v ; typedef typename vector_traits < OUT , vsize > :: type out_v ; /// vsize wide vector of ints, used for gather/scatter indexes typedef typename vector_traits < int , vsize > :: ele_v ic_v ; } ; // KFJ 2018-07-14 // To deal with an issue with cppyy, which has trouble processing // templated operator() into an overloaded callable, I introduce this // mixin, which specifically provides two distinct operator() overloads. // This is also a better way to introduce the callable quality, since on // the side of the derived class it requires only inheriting from the // mixin, rather than the verbose templated operator() I used previously. // This is still experimental. /// mixin 'callable' is used with CRP: it serves as additional base to /// unary functors which are meant to provide operator() and takes the /// derived class as it's first template argument, followed be the /// argument types and vectorization width, so that the parameter and /// return type for operator() and - if vsize is greater than 1 - it's /// vectorized overload can be produced. /// This formulation has the advantage of not having to rely on the /// 'out_type_of' mechanism I was using before and provides precisely /// the operator() overload(s) which are appropriate. template < class derived , typename IN , typename OUT , size_t vsize > struct callable { // using a cl_ prefix here for the vectorized types to avoid a name // clash with class unary_functor, which also defines in_v, out_v typedef typename vector_traits < IN , vsize > :: type cl_in_v ; typedef typename vector_traits < OUT , vsize > :: type cl_out_v ; OUT operator() ( const IN & in ) { auto self = static_cast < derived * > ( this ) ; OUT out ; self->eval ( in , out ) ; return out ; } template < typename = std::enable_if < ( vsize > 1 ) > > cl_out_v operator() ( const cl_in_v & in ) { auto self = static_cast < derived * > ( this ) ; cl_out_v out ; self->eval ( in , out ) ; return out ; } } ; /// class chain_type is a helper class to pass one unary functor's result /// as argument to another one. We rely on T1 and T2 to provide a few of the /// standard types used in unary functors. Typically, T1 and T2 will both be /// vspline::unary_functors, but the type requirements could also be fulfilled /// manually. /// /// Note how callability is introduced via the mixin 'vspline::callable'. /// The inheritance definition looks confusing, the template arg list reads as: /// 'the derived class, followed by the arguments needed to determine the /// call signature(s)'. See vspline::callable above. template < typename T1 , typename T2 > struct chain_type : public vspline::unary_functor < typename T1::in_type , typename T2::out_type , T1::vsize > , public vspline::callable < chain_type < T1 , T2 > , typename T1::in_type , typename T2::out_type , T1::vsize > { // definition base_type enum { vsize = T1::vsize } ; typedef vspline::unary_functor < typename T1::in_type , typename T2::out_type , vsize > base_type ; using typename base_type::in_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::out_v ; // we require both functors to share the same vectorization width static_assert ( T1::vsize == T2::vsize , "can only chain unary_functors with the same vector width" ) ; static_assert ( std::is_same < typename T1::out_type , typename T2::in_type > :: value , "chain: output of first functor must match input of second functor" ) ; typedef typename T1::out_type intermediate_type ; typedef typename T1::out_v intermediate_v ; // hold the two functors by value const T1 t1 ; const T2 t2 ; // the constructor initializes them chain_type ( const T1 & _t1 , const T2 & _t2 ) : t1 ( _t1 ) , t2 ( _t2 ) { } ; // the actual eval needs a bit of trickery to determine the type of // the intermediate type from the type of the first argument. void eval ( const in_type & argument , out_type & result ) const { intermediate_type intermediate ; t1.eval ( argument , intermediate ) ; // evaluate first functor into intermediate t2.eval ( intermediate , result ) ; // feed it as input to second functor } template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & argument , out_v & result ) const { intermediate_v intermediate ; t1.eval ( argument , intermediate ) ; // evaluate first functor into intermediate t2.eval ( intermediate , result ) ; // feed it as input to second functor } } ; /// chain is a factory function yielding the result of chaining /// two unary_functors. template < class T1 , class T2 > vspline::chain_type < T1 , T2 > chain ( const T1 & t1 , const T2 & t2 ) { return vspline::chain_type < T1 , T2 > ( t1 , t2 ) ; } /// using operator overloading, we can exploit operator+'s semantics /// to chain several unary functors. We need to specifically enable /// this for types derived from unary_functor_tag to avoid a catch-all /// situation. template < typename T1 , typename T2 , typename = std::enable_if < std::is_base_of < vspline::unary_functor_tag < T2::vsize > , T1 > :: value && std::is_base_of < vspline::unary_functor_tag < T1::vsize > , T2 > :: value > > vspline::chain_type < T1 , T2 > operator+ ( const T1 & t1 , const T2 & t2 ) { return vspline::chain ( t1 , t2 ) ; } // /// sometimes, vectorized code for a vspline::unary_functor is not at hand // /// for some specific evaluation. class broadcast_type can broadcast unvectorized // /// evaluation code, so that vectorized data can be procesed with this code, // /// albeit less efficiently. // // template < class inner_type , size_t _vsize > // struct broadcast_type // : public vspline::unary_functor < typename inner_type::in_type , // typename inner_type::out_type , // _vsize > // { // // definition of in_type, out_type, vsize and base_type // // typedef typename inner_type::in_type in_type ; // typedef typename inner_type::out_type out_type ; // typedef typename inner_type::in_ele_type in_ele_type ; // typedef typename inner_type::out_ele_type out_ele_type ; // enum { dim_in = inner_type::dim_in } ; // enum { dim_out = inner_type::dim_out } ; // enum { vsize = _vsize } ; // // typedef vspline::unary_functor // < in_type , out_type , vsize > base_type ; // // using typename base_type::in_v ; // using typename base_type::out_v ; // // const inner_type inner ; // // broadcast_type ( const inner_type & _inner ) // : inner ( _inner ) // { } ; // // /// single-value evaluation simply delegates to inner // // void eval ( const in_type & in , out_type & out ) const // { // inner.eval ( in , out ) ; // } // // // vector_traits for in_type and out_type // // typedef typename vspline::vector_traits < in_type , _vsize > iv_traits ; // typedef typename vspline::vector_traits < out_type , _vsize > ov_traits ; // // // now we implement the actual broadcast // // /// vectorized evaluation takes an 'in_v' as it's argument and stores // /// it's output to an 'out_v'. We specifically enable this routine // /// for cases where 'vsize', the vectorization width, is greater than // /// one. This is necessary, since with Vc enabled, vsize may still // /// be one - for example if if the data type requires it - and not // /// using the enable_if statement would produce a second 'eval' // /// overload with identical signature to the one above. // // template < typename = std::enable_if < ( vsize > 1 ) > > // void eval ( const in_v & in , // out_v & out ) const // { // // we want TinyVectors even if there is only one channel. // // this way we can iterate over the channels. // // typedef typename iv_traits::nd_ele_v iv_type ; // typedef typename ov_traits::nd_ele_v ov_type ; // // // we reinterpret input and output as nD types. in_v/out_v are // // plain SIMD types if in_type/out_type is fundamental; here we // // want a TinyVector of one element in this case. // // const iv_type & iv ( reinterpret_cast < const iv_type & > ( in ) ) ; // ov_type & ov ( reinterpret_cast < ov_type & > ( out ) ) ; // // // now the broadcast: // // // we use the equivalent reinterpretation to un-simdized // // input/output data // // typename iv_traits::nd_ele_type i ; // typename ov_traits::nd_ele_type o ; // // in_type & iref ( reinterpret_cast < in_type & > ( i ) ) ; // out_type & oref ( reinterpret_cast < out_type & > ( o ) ) ; // // for ( int e = 0 ; e < _vsize ; e++ ) // { // // extract the eth input value from the simdized input // // for ( int d = 0 ; d < iv_traits::dimension ; d++ ) // i[d] = iv[d][e] ; // // // process it with eval, passing the eval-compatible references // // inner.eval ( iref , oref ) ; // // // now distribute eval's result to the SIMD output // // for ( int d = 0 ; d < ov_traits::dimension ; d++ ) // ov[d][e] = o[d] ; // } // } // // } ; // // /// type of a std::function for unvectorized evaluation: // // template < class IN , class OUT > // using eval_type = std::function < void ( const IN & , OUT & ) > ; // // /// helper class hold_type holds a single-element evaluation function // // template < class IN , class OUT > // struct hold_type // : public vspline::unary_functor < IN , OUT , 1 > // { // const eval_type < IN , OUT > eval ; // // hold_type ( eval_type < IN , OUT > _eval ) // : eval ( _eval ) // { } ; // } ; // // /// factory function to create a broadcast_type from another vspline::unary_functor // /// This will pick the other functor's unvectorized eval routine and broadcast it, // /// The vectorized eval routine of the other functor (if present) is ignored. // // template < class inner_type , size_t _vsize > // broadcast_type < inner_type , _vsize > broadcast ( const inner_type & inner ) // { // return broadcast_type < inner_type , _vsize > ( inner ) ; // } // // /// factory function to create a broadcast_type from a std::function // /// implementing the unvectorized evaluation. // /// to broadcast a single-value evaluation function, we package it // /// in a hold_type, which broadcast can handle. // // template < class IN , class OUT , size_t _vsize > // broadcast_type < hold_type < IN , OUT > , _vsize > // broadcast ( eval_type < IN , OUT > _eval ) // { // return broadcast_type < hold_type < IN , OUT > , _vsize > // ( hold_type < IN , OUT > ( _eval ) ) ; // } /// eval_wrap is a helper function template, wrapping an 'ordinary' /// function which returns some value given some input in a void function /// taking input as const reference and writing output to a reference, /// which is the signature used for evaluation in vspline::unary_functors. template < class IN , class OUT > std::function < void ( const IN& , OUT& ) > eval_wrap ( std::function < OUT ( const IN& ) > f ) { return [f] ( const IN& in , OUT& out ) { out = f ( in ) ; } ; } /// class grok_type is a helper class wrapping a vspline::unary_functor /// so that it's type becomes opaque - a technique called 'type erasure', /// here applied to vspline::unary_functors with their specific /// capability of providing both vectorized and unvectorized operation /// in one common object. /// /// While 'grokking' a unary_functor may degrade performance slightly, /// the resulting type is less complex, and when working on complex /// constructs involving several unary_functors, it can be helpful to /// wrap the whole bunch into a grok_type for some time to make compiler /// messages more palatable. I even suspect that the resulting functor, /// which simply delegates to two std::functions, may optimize better at /// times than a more complex functor in the 'grokkee'. /// /// Performance aside, 'grokking' a vspline::unary_functor produces a /// simple, consistent type that can hold *any* unary_functor with the /// given input and output type(s), so it allows to hold and use a /// variety of (intrinsically differently typed) functors at runtime /// via a common handle which is a vspline::unary_functor itself and /// can be passed to the transform-type routines. With unary_functors /// being first-class, copyable objects, this also makes it possible /// to pass around unary_functors between different TUs where user /// code can provide new functors at will which can simply be used /// without having to recompile to make their type known, at the cost /// of a call through a std::function. /// /// grok_type also provides a convenient way to introduce functors into /// vspline. Since the functionality is implemented with std::functions, /// we allow direct initialization of these std::functions on top of /// 'grokking' the capabilities of another unary_functor via lambda /// expressions. 'Ordinary' functions can also be grokked. /// /// For grok_type objects where _vsize is greater 1, there are /// constructor overloads taking only a single function. These /// constructors broadcast the unvectorized function to process /// vectorized data, providing a quick way to produce code which /// runs with vector data, albeit less efficiently than true vector /// code. /// /// finally, for convenience, grok_type also provides operator(), /// to use the grok_type object with function call syntax, and it /// also provides the common 'eval' routine(s), just like any other /// unary_functor. template < typename IN , // argument or input type typename OUT , // result type size_t _vsize = vspline::vector_traits < IN > :: size > struct grok_type : public vspline::unary_functor < IN , OUT , _vsize > , public vspline::callable < grok_type < IN , OUT , _vsize > , IN , OUT , _vsize > { typedef vspline::unary_functor < IN , OUT , _vsize > base_type ; enum { vsize = _vsize } ; using typename base_type::in_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::out_v ; typedef std::function < void ( const in_type & , out_type & ) > eval_type ; typedef std::function < out_type ( const in_type & ) > call_type ; eval_type _ev ; // given these types, we can define the types for the std::function // we will use to wrap the grokkee's evaluation code in. typedef std::function < void ( const in_v & , out_v & ) > v_eval_type ; // this is the class member holding the std::function: v_eval_type _v_ev ; // we also define a std::function type using 'normal' call/return syntax typedef std::function < out_v ( const in_v & ) > v_call_type ; /// we provide a default constructor so we can create an empty /// grok_type and assign to it later. Calling the empty grok_type's /// eval will result in an exception. grok_type() { } ; /// direct initialization of the internal evaluation functions /// this overload, with two arguments, specifies the unvectorized /// and the vectorized evaluation function explicitly. grok_type ( const eval_type & fev , const v_eval_type & vfev ) : _ev ( fev ) , _v_ev ( vfev ) { } ; /// constructor taking a call_type and a v_call_type, /// initializing the two std::functions _ev and _v_ev /// with wrappers around these arguments which provide /// the 'standard' vspline evaluation functor signature grok_type ( call_type f , v_call_type vf ) : _ev ( eval_wrap ( f ) ) , _v_ev ( eval_wrap ( vf ) ) { } ; /// constructor from 'grokkee' using lambda expressions /// to initialize the std::functions _ev and _v_ev. /// we enable this if grokkee_type is a vspline::unary_functor template < class grokkee_type , typename std::enable_if < std::is_base_of < vspline::unary_functor_tag < vsize > , grokkee_type > :: value , int > :: type = 0 > grok_type ( grokkee_type grokkee ) : _ev ( [ grokkee ] ( const IN & in , OUT & out ) { grokkee.eval ( in , out ) ; } ) , _v_ev ( [ grokkee ] ( const in_v & in , out_v & out ) { grokkee.eval ( in , out ) ; } ) { } ; // /// constructor taking only an unvectorized evaluation function. // /// this function is broadcast, providing evaluation of SIMD types // /// with non-vector code, which is less efficient. // // grok_type ( const eval_type & fev ) // : _ev ( fev ) , // _v_ev ( [ fev ] ( const in_v & in , out_v & out ) // { vspline::broadcast < IN , OUT , vsize > (fev) // .eval ( in , out ) ; } ) // { } ; // // /// constructor taking only one call_type, which is also broadcast, // /// since the call_type std::function is wrapped to provide a // /// std::function with vspline's standard evaluation functor signature // /// and the result is fed to the single-argument functor above. // // grok_type ( const call_type & f ) // : grok_type ( eval_wrap ( f ) ) // { } ; /// unvectorized evaluation. This is delegated to _ev. void eval ( const IN & i , OUT & o ) const { _ev ( i , o ) ; } /// vectorized evaluation function template /// the eval overload above will catch calls with (in_type, out_type) /// while this overload will catch vectorized evaluations. template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & i , out_v & o ) const { _v_ev ( i , o ) ; } } ; /// specialization of grok_type for _vsize == 1 /// this is the only possible specialization if vectorization is not used. /// here we don't use _v_ev but only the unvectorized evaluation. template < typename IN , // argument or input type typename OUT // result type > struct grok_type < IN , OUT , 1 > : public vspline::unary_functor < IN , OUT , 1 > , public vspline::callable < grok_type < IN , OUT , 1 > , IN , OUT , 1 > { typedef vspline::unary_functor < IN , OUT , 1 > base_type ; enum { vsize = 1 } ; using typename base_type::in_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::out_v ; typedef std::function < void ( const in_type & , out_type & ) > eval_type ; typedef std::function < out_type ( const in_type & ) > call_type ; eval_type _ev ; grok_type() { } ; template < class grokkee_type , typename std::enable_if < std::is_base_of < vspline::unary_functor_tag < 1 > , grokkee_type > :: value , int > :: type = 0 > grok_type ( grokkee_type grokkee ) : _ev ( [ grokkee ] ( const IN & in , OUT & out ) { grokkee.eval ( in , out ) ; } ) { } ; grok_type ( const eval_type & fev ) : _ev ( fev ) { } ; grok_type ( call_type f ) : _ev ( eval_wrap ( f ) ) { } ; void eval ( const IN & i , OUT & o ) const { _ev ( i , o ) ; } } ; /// grok() is the corresponding factory function, wrapping grokkee /// in a vspline::grok_type. template < class grokkee_type > vspline::grok_type < typename grokkee_type::in_type , typename grokkee_type::out_type , grokkee_type::vsize > grok ( const grokkee_type & grokkee ) { return vspline::grok_type < typename grokkee_type::in_type , typename grokkee_type::out_type , grokkee_type::vsize > ( grokkee ) ; } /// amplify_type amplifies it's input with a factor. If the data are /// multi-channel, the factor is multi-channel as well and the channels /// are amplified by the corresponding elements of the factor. /// I added this class to make work with integer-valued splines more /// comfortable - if these splines are prefiltered with 'boost', the /// effect of the boost has to be reversed at some point, and amplify_type /// does just that when you use 1/boost as the 'factor'. template < class _in_type , class _out_type = _in_type , class _math_type = _in_type , size_t _vsize = vspline::vector_traits < _in_type > :: vsize > struct amplify_type : public vspline::unary_functor < _in_type , _out_type , _vsize > , public vspline::callable < amplify_type < _in_type , _out_type , _math_type , _vsize > , _in_type , _out_type , _vsize > { typedef typename vspline::unary_functor < _in_type , _out_type , _vsize > base_type ; enum { vsize = _vsize } ; enum { dimension = base_type::dim_in } ; // TODO: might assert common dimensionality using typename base_type::in_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::out_v ; using typename base_type::in_nd_ele_v ; using typename base_type::out_nd_ele_v ; typedef _math_type math_type ; typedef typename vigra::ExpandElementResult < math_type > :: type math_ele_type ; typedef vigra::TinyVector < math_ele_type , dimension > math_nd_ele_type ; typedef typename vspline::vector_traits < math_ele_type , vsize > :: type math_ele_v ; const math_type factor ; // constructors initialize factor. If dimension is greater than 1, // we have two constructors, one taking a TinyVector, one taking // a single value for all dimensions. template < typename = std::enable_if < ( dimension > 1 ) > > amplify_type ( const math_type & _factor ) : factor ( _factor ) { } ; amplify_type ( const math_ele_type & _factor ) : factor ( _factor ) { } ; void eval ( const in_type & in , out_type & out ) const { out = out_type ( math_type ( in ) * factor ) ; } template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & in , out_v & out ) const { // we take a view to the arguments as TinyVectors, even if // the data are 'singular' const in_nd_ele_v & _in = reinterpret_cast < in_nd_ele_v const & > ( in ) ; const math_nd_ele_type & _factor = reinterpret_cast < math_nd_ele_type const & > ( factor ) ; out_nd_ele_v & _out = reinterpret_cast < out_nd_ele_v & > ( out ) ; // and perform the application of the factor element-wise for ( int i = 0 ; i < dimension ; i++ ) vspline::assign ( _out[i] , math_ele_v ( _in[i] ) * _factor[i] ) ; } } ; /// flip functor produces it's input with component order reversed. /// This can be used to deal with situations where coordinates in /// the 'wrong' order have to be fed to a functor expecting the opposite /// order and should be a fast way of doing so, since the compiler can /// likely optimize it well. /// I added this class to provide simple handling of incoming NumPy /// coordinates, which are normally in reverse order of vigra coordinates template < typename _in_type , size_t _vsize = vspline::vector_traits < _in_type > :: vsize > struct flip : public vspline::unary_functor < _in_type , _in_type , _vsize > , public vspline::callable < flip < _in_type , _vsize > , _in_type , _in_type , _vsize > { typedef typename vspline::unary_functor < _in_type , _in_type , _vsize > base_type ; enum { vsize = _vsize } ; enum { dimension = base_type::dim_in } ; using typename base_type::in_type ; using typename base_type::out_type ; using typename base_type::in_v ; using typename base_type::out_v ; using typename base_type::in_nd_ele_type ; using typename base_type::out_nd_ele_type ; using typename base_type::in_nd_ele_v ; using typename base_type::out_nd_ele_v ; void eval ( const in_type & in_ , out_type & out ) const { // we need a copy of 'in' in case _in == out in_type in ( in_ ) ; // we take a view to the arguments as TinyVectors, even if // the data are 'singular' const in_nd_ele_type & _in = reinterpret_cast < in_nd_ele_type const & > ( in ) ; out_nd_ele_type & _out = reinterpret_cast < out_nd_ele_type & > ( out ) ; for ( int e = 0 ; e < dimension ; e++ ) _out [ e ] = _in [ dimension - e - 1 ] ; } template < typename = std::enable_if < ( vsize > 1 ) > > void eval ( const in_v & in_ , out_v & out ) const { // we need a copy of 'in' in case _in == out in_v in ( in_ ) ; // we take a view to the arguments as TinyVectors, even if // the data are 'singular' const in_nd_ele_v & _in = reinterpret_cast < in_nd_ele_v const & > ( in ) ; out_nd_ele_v & _out = reinterpret_cast < out_nd_ele_v & > ( out ) ; for ( int e = 0 ; e < dimension ; e++ ) vspline::assign ( _out [ e ] , _in [ dimension - e - 1 ] ) ; } } ; } ; // end of namespace vspline #endif // VSPLINE_UNARY_FUNCTOR_H kfj-vspline-4b365417c271/vector.h000066400000000000000000001057071333775006700164000ustar00rootroot00000000000000/************************************************************************/ /* */ /* vspline - a set of generic tools for creation and evaluation */ /* of uniform b-splines */ /* */ /* Copyright 2015 - 2018 by Kay F. Jahnke */ /* */ /* The git repository for this software is at */ /* */ /* https://bitbucket.org/kfj/vspline */ /* */ /* Please direct questions, bug reports, and contributions to */ /* */ /* kfjahnke+vspline@gmail.com */ /* */ /* Permission is hereby granted, free of charge, to any person */ /* obtaining a copy of this software and associated documentation */ /* files (the "Software"), to deal in the Software without */ /* restriction, including without limitation the rights to use, */ /* copy, modify, merge, publish, distribute, sublicense, and/or */ /* sell copies of the Software, and to permit persons to whom the */ /* Software is furnished to do so, subject to the following */ /* conditions: */ /* */ /* The above copyright notice and this permission notice shall be */ /* included in all copies or substantial portions of the */ /* Software. */ /* */ /* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */ /* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */ /* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */ /* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */ /* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */ /* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */ /* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */ /* OTHER DEALINGS IN THE SOFTWARE. */ /* */ /************************************************************************/ /*! \file vector.h \brief code for horizontal vectorization in vspline vspline currently has three ways of approaching vectorization: - no vectorization. Scalar code is less complex since it does not have to aggregate the data into vectorization-friendly parcels, and for some data types, the performance is just as good as with vectorization. Use of scalar code results from setting the vectorization width to 1. This is usually a template argument going by the name 'vsize'. - Use of Vc for vectorization. This requires the presence of Vc during compilation and results in explicit vectorization for all elementary types Vc can handle. Vc provides code for several operations which are outside the scope of autovectorization, most prominently hardware gather and scatter operations, and the explicit vectorization with Vc makes sure that vectorization is indeed used whenever possible, rather than having to rely on the compiler to recognize the opportunity. Use of Vc has to be explicitly activated by defining USE_VC during compilation. Using this option usually produces the fastest code. The downside is the dependence on an external library which may or may not actually implement the intended vector operations with vector code for a given target: Newer processors may not yet be supported, or support may be implemented for part of the instructions only. Also, the Vc version coming from the distro's packet management may not be up-to-date. Building processing pipelines based on Vc::SimdArray is, on the other hand, straightforward - the type is well-thought-out and there is good library support for many operations. Use of Vc triggers use of fallback code for elementary types which Vc can't vectorize - such types are pseudo-vectorized: - The third option is to produce code which is designed to be easily recognized by the compiler as amenable to autovectorization. This is a technique I call 'goading': data are processed in small aggregates of vector friendly size, resulting in inner loops which oftentimes are recognized by the autovectorization stage, resulting in hardware vector code if the compiler flags allow for it and the compiler can generate code for the intended target. Since this approach relies entirely on the compiler's capability to autovectorize the (deliberately vectorization-friendly) code, the mileage varies. If it works, this is a clean and simple solution. A disadvantage is the use of class simd_tv for vectorization, which is mildly exotic and very much a vspline creature - building processing pipelines using this type will not be as effortless as using Vc::SimdArray. As long as you're not building your own functors to be used with vspline's family of transform-like functions, the precise mode of vectorization remains an internal issue and you needn't concern yourself with with it beyond choosing whether you want vspline to use Vc or not, and choosing a suitable vectorization width if the default does not suit you. It's important to understand that using simd_tv is not simply mapping, say, pixels of three floats to simd_tv of three floats - that would be 'vertical' vectorization, which is represented by vspline's *scalar* code. Instead, vspline is coded to use *horizontal* vectorization, which produces vector data fitting the size of the vector unit's registers, where each element held by the vector has exactly the same meaning as every other: rather than vectors holding, like, the colour channels of a pixel, we have a 'red', a 'green' ad a 'blue' vector holding, say, eight floats each. Horizontal vectorization is best explicitly coded, and if it is coded explicitly, the code structure itself suggests vectorization to the compiler. Using code like Vc gives more structure to this process and adds capabilities beyond the scope of autovectorization, but having the horizontal vectorization manifest in the code's structure already goes a long way, and if the 'structurally' vectorized code autovectorizes well, that may well be 'good enough' as it is. In my experience, it is often significantly faster than scalar code - provided the processor has vector units. Using Vc for SIMD aggregation is not always possible: - the user may not want/have Vc - Vc may not be able to vectorize a given fundamental struct simd_tv provides an implementation of a type exhibiting roughly the same interface as Vc::SimdArray, which is built on top of a vigra::TinyVector and uses loops to process the multiple data it holds. Many of these loops will be autovectorized, so the effect is quite similar to using 'proper' explicit SIMD code - the notable exception being gather/scatter access to memory, which - AFAICT - is not automatically translated to SIMD gathers and scatters. Most of the pseudo-SIMD capabilities are introduced by mixins, which might be useful to instrumentalize other 'host' types for SIMD emulation - std::valarray springs to mind, as it can do arithmetic, though this needs some finessing, as the binary operators return expression templates and the size is not a template argument. Using simd_tv as fallback if Vc is not available allows us to use most of the vectorized code in vspline as long as it is not Vc-specific. Vc-specific code is inside #ifdef USE_VC ... #endif preprocessor statements, and vspline provides an #else case where necessary (some code can be excluded completely from compilation if Vc is not used). Running vspline's vector code with simd_tv is slower than running it with Vc::SimdArray, but usually faster than running scalar code. I assume that the speed difference is largely due to the absence of hardware gather/scatter operations, which are not produced by autovectorization since their loop equivalents aren't recognized by the compilers as constructs which have a vectorized representation. This is new code, the emulation of Vc::SimdArray is not at all complete. What's provided here is the functionality of Vc::SimdArray which is actually used inside namespace vspline, so all vspline functions should function as expected. When using vspline's transform-like functions with functors using Vc code, the incomplete emulation will likely show, unless the use of Vc is limited to what's emulated in this header. The implementation of the mixins and the type itself are deliberately trivial for the time being to avoid introducing errors by trying to be comprehensive. If necessary, the type can evolve to perform better. Since simd_tv inherits from vigra::TinyVector, it's easy to 'break out' of functions receiving simd_tv as arguments: simd_tv references can simply be cast to vigra::TinyVector references and passed on as such. So if you have a vspline::unary_functor with a vectorized eval overload, you can cast the incoming argument to const TinyVector&, the outgoing argument to TinyVector&, and then work on the TinyVector references. This may be helpful if the simd_tv gets in the way. If you just want to use vspline's transform-like functions, this gives you an easy way of processing arrays with multithreading and aggregation without forcing you to deal with the simd_tv data type. Note that this header is included by vspline/common.h, so this code is available throughout vspline. */ #ifndef VSPLINE_VECTOR_H #define VSPLINE_VECTOR_H #ifdef USE_VC #include // this enum will hold true or false, depending on whether the // translation unit including this header was compiled with USE_VC // defined or not. enum { vc_in_use = true } ; #else // #ifdef USE_VC enum { vc_in_use = false } ; #endif // #ifdef USE_VC namespace vspline { // The first section of code in this file provides the type 'simd_tv' // which emulates SIMD data types and operations using a vigra::TinyVector. // This type's capabilities are mostly provided as mixins, the type // definition follows after the mixins. /// generic_simd_memory_access is a mixin used to provide SIMD-typical /// memory access functionality to some template class X in a way which /// only relies on X being indexable. template < template < typename , size_t > class X , typename T , size_t N > struct generic_simd_memory_access { typedef X < T , N > derived_t ; /// generic load uses a loop: void load ( const T * const p_src ) { derived_t & di ( * ( static_cast < derived_t * > ( this ) ) ) ; for ( size_t i = 0 ; i < N ; i++ ) di [ i ] = p_src [ i ] ; } /// generic gather performs the gather operation using a loop. /// Note how index_type is introduced as a template argument, /// allowing any type which provides operator[] ( int ) template < class index_type > void gather ( const T * const p_src , const index_type & indexes ) { derived_t & di ( * ( static_cast < derived_t * > ( this ) ) ) ; for ( size_t i = 0 ; i < N ; i++ ) di [ i ] = p_src [ indexes [ i ] ] ; } /// store saves the content of the container to memory void store ( T * const p_src ) const { const derived_t & di ( * ( static_cast < const derived_t * > ( this ) ) ) ; for ( size_t i = 0 ; i < N ; i++ ) p_src [ i ] = di [ i ] ; } /// scatter is the reverse operation to gather, see the comments there. template < class index_type > void scatter ( T * const p_src , const index_type & indexes ) const { const derived_t & di ( * ( static_cast < const derived_t * > ( this ) ) ) ; for ( size_t i = 0 ; i < N ; i++ ) p_src [ indexes [ i ] ] = di [ i ] ; } } ; /// mixin 'compare' provides methods to produce a mask on comparing /// a vector with some other indexable object or a scalar. template < template < typename , size_t > class X , typename T , size_t N > struct compare { typedef X < T , N > derived_t ; typedef X < bool , N > mask_t ; // we take lhs and rhs by value. If the calling code passes anything that // can be used to construct derived_t, we'll receive derived_t here, so we // can safely index lhs and rhs and this single macro is enough: #define COMPARE_FUNC(OP,OPFUNC) \ friend mask_t OPFUNC ( derived_t lhs , \ derived_t rhs ) \ { \ mask_t m ; \ for ( size_t i = 0 ; i < N ; i++ ) \ m [ i ] = ( lhs [ i ] OP rhs [ i ] ) ; \ return m ; \ } COMPARE_FUNC(<,operator<) ; COMPARE_FUNC(<=,operator<=) ; COMPARE_FUNC(>,operator>) ; COMPARE_FUNC(>=,operator>=) ; COMPARE_FUNC(==,operator==) ; COMPARE_FUNC(!=,operator!=) ; #undef COMPARE_FUNC // TODO: test if providing comparison with T is faster than broadcasting // T arguments to derived_t by accepting the args per value } ; /// 'bitwise_op' 'rolls out' bitwise and, or and xor. Inside vspline, /// this is only used for masks, so we enable this code only if T is /// bool. Might be extended to integral types, though. template < template < typename , size_t > class X , typename T , size_t N > struct bitwise_op { typedef X < T , N > derived_t ; #define BITWISE_OP(OPFUNC,OPEQ) \ template < typename = std::enable_if \ < std::is_same < T , bool > :: value > > \ friend derived_t OPFUNC ( derived_t lhs , \ derived_t rhs ) \ { \ for ( size_t i = 0 ; i < N ; i++ ) \ lhs [ i ] OPEQ rhs [ i ] ; \ return lhs ; \ } BITWISE_OP(operator&,&=) BITWISE_OP(operator|,|=) BITWISE_OP(operator^,^=) #undef BITWISE_OP } ; /// 'broadcast_std_func' applies functions from namespace std to /// each element in a vector, or to each corresponding pair of /// elements in two vectors. While this might be extended, we only /// provide the set of functions which are actually needed inside /// vspline. Most functions work without using this mixin because /// vigra::TinyVector provides the 'rollout'. // TODO: abs is defined for vigra::TinyVector, but without rolling // it out here I don't get it to work on simd_tv. template < template < typename , size_t > class X , typename T , size_t N > struct broadcast_std_func { typedef X < T , N > derived_t ; #define BROADCAST_STD_FUNC(FUNC) \ friend derived_t FUNC ( derived_t arg ) \ { \ for ( size_t i = 0 ; i < N ; i++ ) \ arg [ i ] = std::FUNC ( arg [ i ] ) ; \ return arg ; \ } BROADCAST_STD_FUNC(abs) BROADCAST_STD_FUNC(trunc) // BROADCAST_STD_FUNC(round) // BROADCAST_STD_FUNC(floor) // BROADCAST_STD_FUNC(ceil) // BROADCAST_STD_FUNC(log) // BROADCAST_STD_FUNC(exp) // BROADCAST_STD_FUNC(sqrt) // BROADCAST_STD_FUNC(sin) // BROADCAST_STD_FUNC(cos) // BROADCAST_STD_FUNC(tan) // BROADCAST_STD_FUNC(asin) // BROADCAST_STD_FUNC(acos) // BROADCAST_STD_FUNC(atan) #undef BROADCAST_STD_FUNC // #define BROADCAST_STD_FUNC2(FUNC) \ // friend derived_t FUNC ( derived_t arg , derived_t arg2 ) \ // { \ // for ( size_t i = 0 ; i < N ; i++ ) \ // arg [ i ] = std::FUNC ( arg [ i ] , arg2 [ i ] ) ; \ // return arg ; \ // } // // BROADCAST_STD_FUNC2(atan2) // // #undef BROADCAST_STD_FUNC2 } ; /// 'reduce_to_bool' provides any_of, all_of and none_of, /// which reduce a vector to a boolean. currently unused, /// simd_tv delegates to vigra::TinyVector's 'all' and 'any' // template < template < typename , size_t > class X , // typename T , size_t N > // struct reduce_to_bool // { // typedef X < T , N > derived_t ; // // friend bool any_of ( const derived_t & arg ) // { // bool result = false ; // for ( size_t i = 0 ; i < N ; i++ ) // result |= bool ( arg[i] ) ; // return result ; // } // // friend bool all_of ( const derived_t & arg ) // { // bool result = true ; // for ( size_t i = 0 ; i < N ; i++ ) // result &= bool ( arg[i] ) ; // return result ; // } // // friend bool none_of ( const derived_t & arg ) // { // bool result = true ; // for ( size_t i = 0 ; i < N ; i++ ) // result &= ( ! bool ( arg[i] ) ) ; // return result ; // } // } ; /// struct simd_tv inherits from TinyVector and gets the SIMD functionality /// by inheriting a set of mixins. simd_tv is used inside vspline wherever /// vectorized data are processed - unless Vc is available and capable of /// vectorizing a given fundamental type. So Vc code is opt-in only - /// vspline can function without it, oftentimes even without much of a /// performance penalty, but if it can be used, Vc may provide an extra /// speedup which is critical for some use scenarios. Within vspline, I /// found the most notable speedup when performing b-spline evaluations /// on float data, which is a very common requirement. template < typename T , size_t N > struct simd_tv : public vigra::TinyVector < T , N > , public generic_simd_memory_access < simd_tv , T , N > , public compare < simd_tv , T , N > , public bitwise_op < simd_tv , T , N > , public broadcast_std_func < simd_tv , T , N > { typedef T value_type ; typedef simd_tv < T , N > this_t ; typedef vigra::TinyVector < T , N > inner_t ; typedef simd_tv < bool , N > mask_type ; typedef simd_tv < int , N > index_type ; static constexpr size_t size() { return N ; } // emulate Vc's IndexesFromZero, which I use often static const index_type IndexesFromZero() { index_type ix ; for ( int i = 0 ; i < N ; i++ ) ix[i] = i ; return ix ; } // perfect-forward any arguments to the base class ctor. Since // simd_tv inherits from vigra::TinyVector, this results in the // availability of all of vigra::TinyVector's ctor overloads. template < typename ... types > simd_tv ( types && ... args ) : inner_t ( std::forward < types > ( args ) ... ) { } ; // ctor from some other vector-like object which has the same // template argument signature and same size, // but element type U. the elements are taken singly, casting to T. template < typename U , template < typename , size_t > class other_v > simd_tv ( const other_v < U , N > & other ) { for ( size_t i = 0 ; i < N ; i++ ) (*this)[i] = T ( other[i] ) ; } #ifdef USE_VC // ctor taking a Vc::SimdArray which has additional // (normally default) template arguments. template < typename U , typename X , size_t x > simd_tv ( const Vc::SimdArray < U , N , X , x > & other ) { for ( size_t i = 0 ; i < N ; i++ ) (*this)[i] = T ( other[i] ) ; } #endif // overrides for assignment operators. For scalars, vigra only has op= // with double arguments, here we define the operations taking the vector's // value_type instead. We also define op= for vectors of equal type. #define OPEQ_FUNC(OPFUNC,OPEQ) \ this_t & OPFUNC ( const value_type & rhs ) \ { \ for ( size_t i = 0 ; i < N ; i++ ) \ (*this) [ i ] OPEQ rhs ; \ return *this ; \ } \ this_t & OPFUNC ( const this_t & rhs ) \ { \ for ( size_t i = 0 ; i < N ; i++ ) \ (*this) [ i ] OPEQ rhs [ i ] ; \ return *this ; \ } OPEQ_FUNC(operator+=,+=) OPEQ_FUNC(operator-=,-=) OPEQ_FUNC(operator*=,*=) OPEQ_FUNC(operator/=,/=) OPEQ_FUNC(operator%=,%=) OPEQ_FUNC(operator&=,&=) OPEQ_FUNC(operator|=,|=) OPEQ_FUNC(operator^=,^=) #undef OPEQ_FUNC // binary operator and left and right scalar operations with // a fundamental. vigra only accepts 'double' as scalar. // We use operatorX= instead, returning the same type as // the vector argument. Since this is code specifically // overriding vigra::TinyVector functionality, it's not // in a mixin. #define OP_FUNC(OPFUNC,OPEQ) \ this_t OPFUNC ( const value_type & rhs ) const \ { \ this_t help ( *this ) ; \ for ( size_t i = 0 ; i < N ; i++ ) \ help [ i ] OPEQ rhs ; \ return help ; \ } \ this_t OPFUNC ( const this_t & rhs ) const \ { \ this_t help ( *this ) ; \ for ( size_t i = 0 ; i < N ; i++ ) \ help [ i ] OPEQ rhs [ i ] ; \ return help ; \ } \ friend this_t OPFUNC ( const value_type & lhs , \ const this_t & rhs ) \ { \ this_t help ( lhs ) ; \ for ( size_t i = 0 ; i < N ; i++ ) \ help [ i ] OPEQ rhs [ i ] ; \ return help ; \ } OP_FUNC(operator+,+=) OP_FUNC(operator-,-=) OP_FUNC(operator*,*=) OP_FUNC(operator/,/=) OP_FUNC(operator%,%=) OP_FUNC(operator&,&=) OP_FUNC(operator|,|=) OP_FUNC(operator^,^=) #undef OP_FUNC // unary - this_t operator-() const { const inner_t & inner ( *this ) ; return - inner ; } // vigra::TinyVector offers 'any' and 'all', so we delegate to the // vigra code instead of using the generic mixin 'reduce_to_bool' friend bool any_of( const this_t & arg ) { return arg.any() ; } friend bool all_of ( const this_t & arg ) { return arg.all() ; } friend bool none_of ( const this_t & arg ) { return ! ( arg.any() ) ; } } ; // end of struct simd_tv // The next section codes use of vectorization in vspline. /// traits class simd_traits provides three traits: /// - 'hsize' holds the hardware vector width if applicable (used only with Vc) /// - 'type': template yielding the vector type for a given vectorization width /// - 'default_size': the default vectorization width to use for T /// /// default simd_traits: without further specialization, T will be vectorized /// as a vspline::simd_tv. This way, *all* types will be vectorized, there is /// no more fallback to scalar code for certain types. Scalar code will only be /// produced if the vectorization width is set to 1 in code taking this /// datum as a template argument. Note that the type simd_traits produces for /// sz == 1 is T itself, not a simd_tv of one element. template < typename T > struct simd_traits { template < size_t sz > using type = typename std::conditional < sz == 1 , T , vspline::simd_tv < T , sz > > :: type ; static const size_t hsize = 0 ; enum { default_size = sizeof ( T ) > 64 ? 1 : 64 / sizeof ( T ) } ; // <<<< heuristic } ; #if defined USE_VC // in Vc ML discussion M. Kretz states that the set of types Vc can vectorize // (with 1.3) is consistent throughout all ABIs, so we can just list the // acceptable types without having to take the ABI int account. // So, for these types we specialize 'simd_traits', resulting in the use of // the appropriate Vc::SimdArray #define VC_SIMD(T) \ template<> struct simd_traits \ { \ static const size_t hsize = Vc::Vector < T > :: size() ; \ template < size_t sz > using type = \ typename std::conditional \ < sz == 1 , \ T , \ Vc::SimdArray < T , sz > \ > :: type ; \ enum { default_size = 2 * hsize } ; \ } ; VC_SIMD(float) VC_SIMD(double) VC_SIMD(int) VC_SIMD(unsigned int) VC_SIMD(short) VC_SIMD(unsigned short) #undef VC_SIMD #endif // USE_VC /// with the definition of 'simd_traits', we can proceed to implement /// 'vector_traits': /// struct vector_traits is a traits class fixing the types used for /// vectorized code in vspline. /// with the types defined by vector_traits, a system of type names is /// introduced which uses a set of patterns: /// - 'ele' stands for 'elementary', the type of an aggregate's component /// - 'nd' stands for 'n-dimensional', a type of an aggregate of one or more /// components of a given elementary type. /// - 'v' suffix indicates a 'simdized' type, vspline uses Vc::SimdArrays /// and vigra::TinyVectors of Vc::SimdArrays if Vc is used and the type /// can be used with Vc::SimdArray, and the equivalent types using /// vspline::simd_tv instead of Vc::SimdArray otherwise. /// the unspecialized definition of class vector_traits will vectorize /// by concatenating instances of T into the type simd_traits produces, /// taking, per default, as many T as the default_size given there. /// This will work with any type T, even though it makes most sense with /// fundamentals. template < typename T , size_t _vsize = 0 , typename Enable = void > struct vector_traits { // T is not 'element-expandable', so 'dimension' is 1 and T is ele_type enum { dimension = 1 } ; typedef T ele_type ; // find the right vectorization width enum { size = _vsize == 0 ? simd_traits < ele_type > :: default_size : _vsize } ; enum { vsize = size } ; enum { hsize = simd_traits < ele_type > :: hsize } ; // produce the 'synthetic' type, typedef vigra::TinyVector < ele_type , 1 > nd_ele_type ; // the vectorized type template < typename U , size_t sz > using vector = typename simd_traits < U > :: template type < sz > ; typedef vector < ele_type , vsize > ele_v ; // and the 'synthetic' vectorized type typedef vigra::TinyVector < ele_v , 1 > nd_ele_v ; // for not 'element-expandable' T, we produce ele_v as 'type' typedef ele_v type ; } ; /// specialization of vector_traits for 'element-expandable' types. /// These types are recognized by vigra's ExpandElementResult mechanism, /// resulting in the formation of a 'vectorized' version of the type. /// As explained above, vectorization is *horizontal*, so if T is, say, /// a pixel of three floats, the type generated here will be a TinyVector /// of three vectors of vsize floats. template < typename T , size_t _vsize > struct vector_traits < T , _vsize , typename std::enable_if < vspline::is_element_expandable < T > :: value > ::type > { // T is 'element-expandable' - meaning it can be element-expanded // with vigra's ExpandElementResult mechanism. We use that to obtain // the elementary type and the dimension of T. Note that, if T is // fundamental, the resulting traits are the same as they would be // for the unspecialized case. What we're interested in here are // multi-channel types; that fundamentals are routed through here // is just as good as if they were routed through the unspecialized // case above. enum { dimension = vigra::ExpandElementResult < T > :: size } ; typedef typename vigra::ExpandElementResult < T > :: type ele_type ; // given the elementary type, we define nd_ele_type as a vigra::TinyVector // of ele_type. This is the 'synthetic' type. typedef vigra::TinyVector < ele_type , dimension > nd_ele_type ; // next we glean the number of elements a 'vector' should contain. // if the template argument 'vsize' was passed as 0, which is the default, // We use the default vector size which simd_traits provides. For // explicitly specified _vsize we take the explicitly specified value. enum { size = _vsize == 0 ? simd_traits < ele_type > :: default_size : _vsize } ; // I prefer to use 'vsize' as it is more specific than mere 'size' enum { vsize = size } ; // hardware vector register size, if applicable - only used with Vc enum { hsize = simd_traits < T > :: hsize } ; // now we obtain the template for a vector of a given size. This will // be either Vc::SimdArray or vspline::simd_tv template < typename U , size_t sz > using vector = typename simd_traits < U > :: template type < sz > ; // using this template and the vectorization width we have established, // we obtain the vectorized data type for a component: typedef vector < ele_type , vsize > ele_v ; // nd_ele_v is the 'synthetic' vectorized type, which is always a // TinyVector of the vectorized component type, possibly with only // one element: typedef vigra::TinyVector < ele_v , dimension > nd_ele_v ; // finally, 'type' is the 'canonical' vectorized type, meaning that if // T is a fundamental we produce the component vector type itself, but if // it is some aggregate (like a TinyVector) we produce a TinyVector of the // component vector data type. So if T is float, 'type' is a vector of float, // If T is a TinyVector of one float, 'type' is a TinyVector of one vector // of float. typedef typename std::conditional < std::is_fundamental < T > :: value , ele_v , nd_ele_v > :: type type ; } ; /// this alias is used as a shorthand to pick the vectorized type /// for a given type T and a size N from 'vector_traits': template < typename T , size_t N > using simdized_type = typename vector_traits < T , N > :: type ; // In order to avoid syntax which is specific to a specific vectorization // method, I use some free functions for assignments, which avoid // member functions of the vector objects. While this produces some // notational inconvenience, it allows a formulation which is // independent of the vectorization used. this way I can use Vc::SimdArray // as a target of an assignment from another vectorized data type, which // would be impossible with operator=, which has to be a member function. // KFJ 2018-05-11 added variants of assign with T == U. I was // surprised to see that this did not produce ambiguity // problems, but it seems okay. // // There are now quite a few 'assign' overloads, so that the code // has explicit variants for several argument constellations. I'm // not sure if this is really necessary (TODO: test), but I want to // keep the impact of 'breaking out' low, to enable pipeline code // to move from one type of vectorized object to another without // too much performance penalty. /// To uniformly handle assignments of vectorized an unvectorized /// data, I use free functions. The first one is enabled if both /// objects are fundamentals and simply assigns with cast to the /// target type: template < typename T , typename U > typename std::enable_if < std::is_fundamental < T > :: value && std::is_fundamental < U > :: value > :: type assign ( T & self , const U & other ) { self = T ( other ) ; } template < typename T > typename std::enable_if < std::is_fundamental < T > :: value > :: type assign ( T & self , const T & other ) { self = other ; } /// The second one is enabled if both arguments are simd_tv of /// the same size template < typename T , typename U , size_t sz > void assign ( vspline::simd_tv < T , sz > & self , const vspline::simd_tv < U , sz > & other ) { for ( size_t i = 0 ; i < sz ; i++ ) { self[i] = T ( other[i] ) ; } } template < typename T , size_t sz > void assign ( vspline::simd_tv < T , sz > & self , const vspline::simd_tv < T , sz > & other ) { self = other ; } #ifdef USE_VC /// with Vc in use, we have overloads for assignment from one /// SimdArray to another and the mixed forms template < typename T , typename U , size_t sz > void assign ( Vc::SimdArray < T , sz > & self , const Vc::SimdArray < U , sz > & other ) { self = Vc::SimdArray < T , sz > ( other ) ; } template < typename T , size_t sz > void assign ( Vc::SimdArray < T , sz > & self , const Vc::SimdArray < T , sz > & other ) { self = other ; } template < typename T , typename U , size_t sz > void assign ( Vc::SimdArray < T , sz > & self , const vspline::simd_tv < U , sz > & other ) { for ( size_t i = 0 ; i < sz ; i++ ) { self[i] = T ( other[i] ) ; } } template < typename T , size_t sz > void assign ( Vc::SimdArray < T , sz > & self , const vspline::simd_tv < T , sz > & other ) { self = Vc::SimdArray < T , sz > ( ( T* ) ( & other ) ) ; } template < typename T , typename U , size_t sz > void assign ( vspline::simd_tv < T , sz > & self , const Vc::SimdArray < U , sz > & other ) { for ( size_t i = 0 ; i < sz ; i++ ) { self[i] = T ( other[i] ) ; } } template < typename T , size_t sz > void assign ( vspline::simd_tv < T , sz > & self , const Vc::SimdArray < T , sz > & other ) { self = vspline::simd_tv < T , sz > ( ( T* ) ( & other ) ) ; } #endif /// assignment between two TinyVectors. This delegates to /// 'assign' for the components, so compilation fails if /// T and U aren't either fundamentals or vector types. This /// is deliberate, since the use of 'assign' is only meant for /// the specific arguments occuring in it's overloads, rather /// than a universal way of assigning any two objects. template < typename T , typename U , int sz > void assign ( vigra::TinyVector < T , sz > & self , const vigra::TinyVector < U , sz > & other ) { for ( int i = 0 ; i < sz ; i++ ) { assign ( self[i] , other[i] ) ; } } /// generic free masked assignment function. wherever 'mask' is true, /// the corresponding entry in 'other' will be assigned to the /// corresponding entry in 'self'. Other may be any type as long as /// it can be used to construct a 'vector', so if other is a scalar, /// it is broadcast. /// /// In my ongoing effort to factor out vectorization, I use this function /// template in preference to Vc's convenient X(mask) = Y syntax. This /// provides a uniform interface to masked assignment which can be used /// with any indexable type. /// /// TODO investigate elaboration of this function by extending it into /// a family with diversified argument spectrum (like, scalar 'other', /// Vc::SimdArray + non-Vc mask etc.) template < template < typename , size_t > class vector , typename T , size_t N , class other_type > void assign_if ( vector < T , N > & self , const typename vector < T , N > :: mask_type & mask , const other_type & other ) { vector < T , N > help ( other ) ; for ( size_t i = 0 ; i < N ; i++ ) { if ( mask [ i ] ) self [ i ] = help [ i ] ; } } #ifdef USE_VC /// overload of assign_if for Vc::SimdArray, using Vc's convenient /// syntax of X(mask) = Y, which may be more efficient as well. template < typename T , size_t N , class other_type > void assign_if ( Vc::SimdArray < T , N > & self , const typename Vc::SimdArray < T , N > :: mask_type & mask , const other_type & other ) { self ( mask ) = other ; } #endif } ; // end of namespace vspline #endif // #ifndef VSPLINE_VECTOR_H kfj-vspline-4b365417c271/vspline.doxy000066400000000000000000003037221333775006700173070ustar00rootroot00000000000000# Doxyfile 1.8.6 # This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for a project. # # All text after a double hash (##) is considered a comment and is placed in # front of the TAG it is preceding. # # All text after a single hash (#) is considered a comment and will be ignored. # The format is: # TAG = value [value, ...] # For lists, items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (\" \"). #--------------------------------------------------------------------------- # Project related configuration options #--------------------------------------------------------------------------- # This tag specifies the encoding used for all characters in the config file # that follow. The default is UTF-8 which is also the encoding used for all text # before the first occurrence of this tag. Doxygen uses libiconv (or the iconv # built into libc) for the transcoding. See http://www.gnu.org/software/libiconv # for the list of possible encodings. # The default value is: UTF-8. DOXYFILE_ENCODING = UTF-8 # The PROJECT_NAME tag is a single word (or a sequence of words surrounded by # double-quotes, unless you are using Doxywizard) that should identify the # project for which the documentation is generated. This name is used in the # title of most generated pages and in a few other places. # The default value is: My Project. PROJECT_NAME = "vspline" # The PROJECT_NUMBER tag can be used to enter a project or revision number. This # could be handy for archiving the generated documentation or if some version # control system is used. PROJECT_NUMBER = 0.4.1 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a # quick idea about the purpose of the project. Keep the description short. PROJECT_BRIEF = "Generic C++11 Code for Uniform B-Splines" # With the PROJECT_LOGO tag one can specify an logo or icon that is included in # the documentation. The maximum height of the logo should not exceed 55 pixels # and the maximum width should not exceed 200 pixels. Doxygen will copy the logo # to the output directory. PROJECT_LOGO = # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path # into which the generated documentation will be written. If a relative path is # entered, it will be relative to the location where doxygen was started. If # left blank the current directory will be used. OUTPUT_DIRECTORY = ../kfj.bitbucket.org # If the CREATE_SUBDIRS tag is set to YES, then doxygen will create 4096 sub- # directories (in 2 levels) under the output directory of each output format and # will distribute the generated files over these directories. Enabling this # option can be useful when feeding doxygen a huge amount of source files, where # putting all generated files in the same directory would otherwise causes # performance problems for the file system. # The default value is: NO. CREATE_SUBDIRS = NO # The OUTPUT_LANGUAGE tag is used to specify the language in which all # documentation generated by doxygen is written. Doxygen will use this # information to generate all constant output in the proper language. # Possible values are: Afrikaans, Arabic, Armenian, Brazilian, Catalan, Chinese, # Chinese-Traditional, Croatian, Czech, Danish, Dutch, English (United States), # Esperanto, Farsi (Persian), Finnish, French, German, Greek, Hungarian, # Indonesian, Italian, Japanese, Japanese-en (Japanese with English messages), # Korean, Korean-en (Korean with English messages), Latvian, Lithuanian, # Macedonian, Norwegian, Persian (Farsi), Polish, Portuguese, Romanian, Russian, # Serbian, Serbian-Cyrillic, Slovak, Slovene, Spanish, Swedish, Turkish, # Ukrainian and Vietnamese. # The default value is: English. OUTPUT_LANGUAGE = English # If the BRIEF_MEMBER_DESC tag is set to YES doxygen will include brief member # descriptions after the members that are listed in the file and class # documentation (similar to Javadoc). Set to NO to disable this. # The default value is: YES. BRIEF_MEMBER_DESC = YES # If the REPEAT_BRIEF tag is set to YES doxygen will prepend the brief # description of a member or function before the detailed description # # Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the # brief descriptions will be completely suppressed. # The default value is: YES. REPEAT_BRIEF = YES # This tag implements a quasi-intelligent brief description abbreviator that is # used to form the text in various listings. Each string in this list, if found # as the leading text of the brief description, will be stripped from the text # and the result, after processing the whole list, is used as the annotated # text. Otherwise, the brief description is used as-is. If left blank, the # following values are used ($name is automatically replaced with the name of # the entity):The $name class, The $name widget, The $name file, is, provides, # specifies, contains, represents, a, an and the. ABBREVIATE_BRIEF = # If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then # doxygen will generate a detailed section even if there is only a brief # description. # The default value is: NO. ALWAYS_DETAILED_SEC = NO # If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all # inherited members of a class in the documentation of that class as if those # members were ordinary class members. Constructors, destructors and assignment # operators of the base classes will not be shown. # The default value is: NO. INLINE_INHERITED_MEMB = NO # If the FULL_PATH_NAMES tag is set to YES doxygen will prepend the full path # before files name in the file list and in the header files. If set to NO the # shortest path that makes the file name unique will be used # The default value is: YES. FULL_PATH_NAMES = YES # The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path. # Stripping is only done if one of the specified strings matches the left-hand # part of the path. The tag can be used to show relative paths in the file list. # If left blank the directory from which doxygen is run is used as the path to # strip. # # Note that you can specify absolute paths here, but also relative paths, which # will be relative from the directory where doxygen is started. # This tag requires that the tag FULL_PATH_NAMES is set to YES. STRIP_FROM_PATH = # The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the # path mentioned in the documentation of a class, which tells the reader which # header file to include in order to use a class. If left blank only the name of # the header file containing the class definition is used. Otherwise one should # specify the list of include paths that are normally passed to the compiler # using the -I flag. STRIP_FROM_INC_PATH = # If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but # less readable) file names. This can be useful is your file systems doesn't # support long names like on DOS, Mac, or CD-ROM. # The default value is: NO. SHORT_NAMES = NO # If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the # first line (until the first dot) of a Javadoc-style comment as the brief # description. If set to NO, the Javadoc-style will behave just like regular Qt- # style comments (thus requiring an explicit @brief command for a brief # description.) # The default value is: NO. JAVADOC_AUTOBRIEF = NO # If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first # line (until the first dot) of a Qt-style comment as the brief description. If # set to NO, the Qt-style will behave just like regular Qt-style comments (thus # requiring an explicit \brief command for a brief description.) # The default value is: NO. QT_AUTOBRIEF = NO # The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a # multi-line C++ special comment block (i.e. a block of //! or /// comments) as # a brief description. This used to be the default behavior. The new default is # to treat a multi-line C++ comment block as a detailed description. Set this # tag to YES if you prefer the old behavior instead. # # Note that setting this tag to YES also means that rational rose comments are # not recognized any more. # The default value is: NO. MULTILINE_CPP_IS_BRIEF = YES # If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the # documentation from any documented member that it re-implements. # The default value is: YES. INHERIT_DOCS = YES # If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce a # new page for each member. If set to NO, the documentation of a member will be # part of the file/class/namespace that contains it. # The default value is: NO. SEPARATE_MEMBER_PAGES = NO # The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen # uses this value to replace tabs by spaces in code fragments. # Minimum value: 1, maximum value: 16, default value: 4. TAB_SIZE = 4 # This tag can be used to specify a number of aliases that act as commands in # the documentation. An alias has the form: # name=value # For example adding # "sideeffect=@par Side Effects:\n" # will allow you to put the command \sideeffect (or @sideeffect) in the # documentation, which will result in a user-defined paragraph with heading # "Side Effects:". You can put \n's in the value part of an alias to insert # newlines. ALIASES = # This tag can be used to specify a number of word-keyword mappings (TCL only). # A mapping has the form "name=value". For example adding "class=itcl::class" # will allow you to use the command class in the itcl::class meaning. TCL_SUBST = # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources # only. Doxygen will then generate output that is more tailored for C. For # instance, some of the names that are used will be different. The list of all # members will be omitted, etc. # The default value is: NO. OPTIMIZE_OUTPUT_FOR_C = NO # Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or # Python sources only. Doxygen will then generate output that is more tailored # for that language. For instance, namespaces will be presented as packages, # qualified scopes will look different, etc. # The default value is: NO. OPTIMIZE_OUTPUT_JAVA = NO # Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran # sources. Doxygen will then generate output that is tailored for Fortran. # The default value is: NO. OPTIMIZE_FOR_FORTRAN = NO # Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL # sources. Doxygen will then generate output that is tailored for VHDL. # The default value is: NO. OPTIMIZE_OUTPUT_VHDL = NO # Doxygen selects the parser to use depending on the extension of the files it # parses. With this tag you can assign which parser to use for a given # extension. Doxygen has a built-in mapping, but you can override or extend it # using this tag. The format is ext=language, where ext is a file extension, and # language is one of the parsers supported by doxygen: IDL, Java, Javascript, # C#, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL. For instance to make # doxygen treat .inc files as Fortran files (default is PHP), and .f files as C # (default is Fortran), use: inc=Fortran f=C. # # Note For files without extension you can use no_extension as a placeholder. # # Note that for custom extensions you also need to set FILE_PATTERNS otherwise # the files are not read by doxygen. EXTENSION_MAPPING = # If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments # according to the Markdown format, which allows for more readable # documentation. See http://daringfireball.net/projects/markdown/ for details. # The output of markdown processing is further processed by doxygen, so you can # mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in # case of backward compatibilities issues. # The default value is: YES. MARKDOWN_SUPPORT = YES # When enabled doxygen tries to link words that correspond to documented # classes, or namespaces to their corresponding documentation. Such a link can # be prevented in individual cases by by putting a % sign in front of the word # or globally by setting AUTOLINK_SUPPORT to NO. # The default value is: YES. AUTOLINK_SUPPORT = YES # If you use STL classes (i.e. std::string, std::vector, etc.) but do not want # to include (a tag file for) the STL sources as input, then you should set this # tag to YES in order to let doxygen match functions declarations and # definitions whose arguments contain STL classes (e.g. func(std::string); # versus func(std::string) {}). This also make the inheritance and collaboration # diagrams that involve STL classes more complete and accurate. # The default value is: NO. BUILTIN_STL_SUPPORT = NO # If you use Microsoft's C++/CLI language, you should set this option to YES to # enable parsing support. # The default value is: NO. CPP_CLI_SUPPORT = NO # Set the SIP_SUPPORT tag to YES if your project consists of sip (see: # http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen # will parse them like normal C++ but will assume all classes use public instead # of private inheritance when no explicit protection keyword is present. # The default value is: NO. SIP_SUPPORT = NO # For Microsoft's IDL there are propget and propput attributes to indicate # getter and setter methods for a property. Setting this option to YES will make # doxygen to replace the get and set methods by a property in the documentation. # This will only work if the methods are indeed getting or setting a simple # type. If this is not the case, or you want to show the methods anyway, you # should set this option to NO. # The default value is: YES. IDL_PROPERTY_SUPPORT = YES # If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC # tag is set to YES, then doxygen will reuse the documentation of the first # member in the group (if any) for the other members of the group. By default # all members of a group must be documented explicitly. # The default value is: NO. DISTRIBUTE_GROUP_DOC = NO # Set the SUBGROUPING tag to YES to allow class member groups of the same type # (for instance a group of public functions) to be put as a subgroup of that # type (e.g. under the Public Functions section). Set it to NO to prevent # subgrouping. Alternatively, this can be done per class using the # \nosubgrouping command. # The default value is: YES. SUBGROUPING = YES # When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions # are shown inside the group in which they are included (e.g. using \ingroup) # instead of on a separate page (for HTML and Man pages) or section (for LaTeX # and RTF). # # Note that this feature does not work in combination with # SEPARATE_MEMBER_PAGES. # The default value is: NO. INLINE_GROUPED_CLASSES = NO # When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions # with only public data fields or simple typedef fields will be shown inline in # the documentation of the scope in which they are defined (i.e. file, # namespace, or group documentation), provided this scope is documented. If set # to NO, structs, classes, and unions are shown on a separate page (for HTML and # Man pages) or section (for LaTeX and RTF). # The default value is: NO. INLINE_SIMPLE_STRUCTS = NO # When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or # enum is documented as struct, union, or enum with the name of the typedef. So # typedef struct TypeS {} TypeT, will appear in the documentation as a struct # with name TypeT. When disabled the typedef will appear as a member of a file, # namespace, or class. And the struct will be named TypeS. This can typically be # useful for C code in case the coding convention dictates that all compound # types are typedef'ed and only the typedef is referenced, never the tag name. # The default value is: NO. TYPEDEF_HIDES_STRUCT = NO # The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This # cache is used to resolve symbols given their name and scope. Since this can be # an expensive process and often the same symbol appears multiple times in the # code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small # doxygen will become slower. If the cache is too large, memory is wasted. The # cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range # is 0..9, the default is 0, corresponding to a cache size of 2^16=65536 # symbols. At the end of a run doxygen will report the cache usage and suggest # the optimal cache size from a speed point of view. # Minimum value: 0, maximum value: 9, default value: 0. LOOKUP_CACHE_SIZE = 0 #--------------------------------------------------------------------------- # Build related configuration options #--------------------------------------------------------------------------- # If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in # documentation are documented, even if no documentation was available. Private # class members and static file members will be hidden unless the # EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES. # Note: This will also disable the warnings about undocumented members that are # normally produced when WARNINGS is set to YES. # The default value is: NO. EXTRACT_ALL = NO # If the EXTRACT_PRIVATE tag is set to YES all private members of a class will # be included in the documentation. # The default value is: NO. EXTRACT_PRIVATE = NO # If the EXTRACT_PACKAGE tag is set to YES all members with package or internal # scope will be included in the documentation. # The default value is: NO. EXTRACT_PACKAGE = NO # If the EXTRACT_STATIC tag is set to YES all static members of a file will be # included in the documentation. # The default value is: NO. EXTRACT_STATIC = NO # If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) defined # locally in source files will be included in the documentation. If set to NO # only classes defined in header files are included. Does not have any effect # for Java sources. # The default value is: YES. EXTRACT_LOCAL_CLASSES = YES # This flag is only useful for Objective-C code. When set to YES local methods, # which are defined in the implementation section but not in the interface are # included in the documentation. If set to NO only methods in the interface are # included. # The default value is: NO. EXTRACT_LOCAL_METHODS = NO # If this flag is set to YES, the members of anonymous namespaces will be # extracted and appear in the documentation as a namespace called # 'anonymous_namespace{file}', where file will be replaced with the base name of # the file that contains the anonymous namespace. By default anonymous namespace # are hidden. # The default value is: NO. EXTRACT_ANON_NSPACES = NO # If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all # undocumented members inside documented classes or files. If set to NO these # members will be included in the various overviews, but no documentation # section is generated. This option has no effect if EXTRACT_ALL is enabled. # The default value is: NO. HIDE_UNDOC_MEMBERS = NO # If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all # undocumented classes that are normally visible in the class hierarchy. If set # to NO these classes will be included in the various overviews. This option has # no effect if EXTRACT_ALL is enabled. # The default value is: NO. HIDE_UNDOC_CLASSES = NO # If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend # (class|struct|union) declarations. If set to NO these declarations will be # included in the documentation. # The default value is: NO. HIDE_FRIEND_COMPOUNDS = NO # If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any # documentation blocks found inside the body of a function. If set to NO these # blocks will be appended to the function's detailed documentation block. # The default value is: NO. HIDE_IN_BODY_DOCS = NO # The INTERNAL_DOCS tag determines if documentation that is typed after a # \internal command is included. If the tag is set to NO then the documentation # will be excluded. Set it to YES to include the internal documentation. # The default value is: NO. INTERNAL_DOCS = NO # If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file # names in lower-case letters. If set to YES upper-case letters are also # allowed. This is useful if you have classes or files whose names only differ # in case and if your file system supports case sensitive file names. Windows # and Mac users are advised to set this option to NO. # The default value is: system dependent. CASE_SENSE_NAMES = YES # If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with # their full class and namespace scopes in the documentation. If set to YES the # scope will be hidden. # The default value is: NO. HIDE_SCOPE_NAMES = NO # If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of # the files that are included by a file in the documentation of that file. # The default value is: YES. SHOW_INCLUDE_FILES = YES # If the SHOW_GROUPED_MEMB_INC tag is set to YES then Doxygen will add for each # grouped member an include statement to the documentation, telling the reader # which file to include in order to use the member. # The default value is: NO. SHOW_GROUPED_MEMB_INC = NO # If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include # files with double quotes in the documentation rather than with sharp brackets. # The default value is: NO. FORCE_LOCAL_INCLUDES = NO # If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the # documentation for inline members. # The default value is: YES. INLINE_INFO = YES # If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the # (detailed) documentation of file and class members alphabetically by member # name. If set to NO the members will appear in declaration order. # The default value is: YES. SORT_MEMBER_DOCS = YES # If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief # descriptions of file, namespace and class members alphabetically by member # name. If set to NO the members will appear in declaration order. Note that # this will also influence the order of the classes in the class list. # The default value is: NO. SORT_BRIEF_DOCS = NO # If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the # (brief and detailed) documentation of class members so that constructors and # destructors are listed first. If set to NO the constructors will appear in the # respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS. # Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief # member documentation. # Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting # detailed member documentation. # The default value is: NO. SORT_MEMBERS_CTORS_1ST = NO # If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy # of group names into alphabetical order. If set to NO the group names will # appear in their defined order. # The default value is: NO. SORT_GROUP_NAMES = NO # If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by # fully-qualified names, including namespaces. If set to NO, the class list will # be sorted only by class name, not including the namespace part. # Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. # Note: This option applies only to the class list, not to the alphabetical # list. # The default value is: NO. SORT_BY_SCOPE_NAME = NO # If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper # type resolution of all parameters of a function it will reject a match between # the prototype and the implementation of a member function even if there is # only one candidate or it is obvious which candidate to choose by doing a # simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still # accept a match between prototype and implementation in such cases. # The default value is: NO. STRICT_PROTO_MATCHING = NO # The GENERATE_TODOLIST tag can be used to enable ( YES) or disable ( NO) the # todo list. This list is created by putting \todo commands in the # documentation. # The default value is: YES. GENERATE_TODOLIST = YES # The GENERATE_TESTLIST tag can be used to enable ( YES) or disable ( NO) the # test list. This list is created by putting \test commands in the # documentation. # The default value is: YES. GENERATE_TESTLIST = YES # The GENERATE_BUGLIST tag can be used to enable ( YES) or disable ( NO) the bug # list. This list is created by putting \bug commands in the documentation. # The default value is: YES. GENERATE_BUGLIST = YES # The GENERATE_DEPRECATEDLIST tag can be used to enable ( YES) or disable ( NO) # the deprecated list. This list is created by putting \deprecated commands in # the documentation. # The default value is: YES. GENERATE_DEPRECATEDLIST= YES # The ENABLED_SECTIONS tag can be used to enable conditional documentation # sections, marked by \if ... \endif and \cond # ... \endcond blocks. ENABLED_SECTIONS = # The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the # initial value of a variable or macro / define can have for it to appear in the # documentation. If the initializer consists of more lines than specified here # it will be hidden. Use a value of 0 to hide initializers completely. The # appearance of the value of individual variables and macros / defines can be # controlled using \showinitializer or \hideinitializer command in the # documentation regardless of this setting. # Minimum value: 0, maximum value: 10000, default value: 30. MAX_INITIALIZER_LINES = 30 # Set the SHOW_USED_FILES tag to NO to disable the list of files generated at # the bottom of the documentation of classes and structs. If set to YES the list # will mention the files that were used to generate the documentation. # The default value is: YES. SHOW_USED_FILES = YES # Set the SHOW_FILES tag to NO to disable the generation of the Files page. This # will remove the Files entry from the Quick Index and from the Folder Tree View # (if specified). # The default value is: YES. SHOW_FILES = YES # Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces # page. This will remove the Namespaces entry from the Quick Index and from the # Folder Tree View (if specified). # The default value is: YES. SHOW_NAMESPACES = YES # The FILE_VERSION_FILTER tag can be used to specify a program or script that # doxygen should invoke to get the current version for each file (typically from # the version control system). Doxygen will invoke the program by executing (via # popen()) the command command input-file, where command is the value of the # FILE_VERSION_FILTER tag, and input-file is the name of an input file provided # by doxygen. Whatever the program writes to standard output is used as the file # version. For an example see the documentation. FILE_VERSION_FILTER = # The LAYOUT_FILE tag can be used to specify a layout file which will be parsed # by doxygen. The layout file controls the global structure of the generated # output files in an output format independent way. To create the layout file # that represents doxygen's defaults, run doxygen with the -l option. You can # optionally specify a file name after the option, if omitted DoxygenLayout.xml # will be used as the name of the layout file. # # Note that if you run doxygen from a directory containing a file called # DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE # tag is left empty. LAYOUT_FILE = # The CITE_BIB_FILES tag can be used to specify one or more bib files containing # the reference definitions. This must be a list of .bib files. The .bib # extension is automatically appended if omitted. This requires the bibtex tool # to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info. # For LaTeX the style of the bibliography can be controlled using # LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the # search path. Do not use file names with spaces, bibtex cannot handle them. See # also \cite for info how to create references. CITE_BIB_FILES = #--------------------------------------------------------------------------- # Configuration options related to warning and progress messages #--------------------------------------------------------------------------- # The QUIET tag can be used to turn on/off the messages that are generated to # standard output by doxygen. If QUIET is set to YES this implies that the # messages are off. # The default value is: NO. QUIET = NO # The WARNINGS tag can be used to turn on/off the warning messages that are # generated to standard error ( stderr) by doxygen. If WARNINGS is set to YES # this implies that the warnings are on. # # Tip: Turn warnings on while writing the documentation. # The default value is: YES. WARNINGS = YES # If the WARN_IF_UNDOCUMENTED tag is set to YES, then doxygen will generate # warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag # will automatically be disabled. # The default value is: YES. WARN_IF_UNDOCUMENTED = YES # If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for # potential errors in the documentation, such as not documenting some parameters # in a documented function, or documenting parameters that don't exist or using # markup commands wrongly. # The default value is: YES. WARN_IF_DOC_ERROR = YES # This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that # are documented, but have no documentation for their parameters or return # value. If set to NO doxygen will only warn about wrong or incomplete parameter # documentation, but not about the absence of documentation. # The default value is: NO. WARN_NO_PARAMDOC = NO # The WARN_FORMAT tag determines the format of the warning messages that doxygen # can produce. The string should contain the $file, $line, and $text tags, which # will be replaced by the file and line number from which the warning originated # and the warning text. Optionally the format may contain $version, which will # be replaced by the version of the file (if it could be obtained via # FILE_VERSION_FILTER) # The default value is: $file:$line: $text. WARN_FORMAT = "$file:$line: $text" # The WARN_LOGFILE tag can be used to specify a file to which warning and error # messages should be written. If left blank the output is written to standard # error (stderr). WARN_LOGFILE = #--------------------------------------------------------------------------- # Configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag is used to specify the files and/or directories that contain # documented source files. You may enter file names like myfile.cpp or # directories like /usr/src/myproject. Separate the files or directories with # spaces. # Note: If this tag is empty the current directory is searched. INPUT = . example # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses # libiconv (or the iconv built into libc) for the transcoding. See the libiconv # documentation (see: http://www.gnu.org/software/libiconv) for the list of # possible encodings. # The default value is: UTF-8. INPUT_ENCODING = UTF-8 # If the value of the INPUT tag contains directories, you can use the # FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and # *.h) to filter out the source-files in the directories. If left blank the # following patterns are tested:*.c, *.cc, *.cxx, *.cpp, *.c++, *.java, *.ii, # *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, *.hh, *.hxx, *.hpp, # *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, *.m, *.markdown, # *.md, *.mm, *.dox, *.py, *.f90, *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf, # *.qsf, *.as and *.js. FILE_PATTERNS = # The RECURSIVE tag can be used to specify whether or not subdirectories should # be searched for input files as well. # The default value is: NO. RECURSIVE = NO # The EXCLUDE tag can be used to specify files and/or directories that should be # excluded from the INPUT source files. This way you can easily exclude a # subdirectory from a directory tree whose root is specified with the INPUT tag. # # Note that relative paths are relative to the directory from which doxygen is # run. EXCLUDE = # The EXCLUDE_SYMLINKS tag can be used to select whether or not files or # directories that are symbolic links (a Unix file system feature) are excluded # from the input. # The default value is: NO. EXCLUDE_SYMLINKS = NO # If the value of the INPUT tag contains directories, you can use the # EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude # certain files from those directories. # # Note that the wildcards are matched against the file with absolute path, so to # exclude all test directories for example use the pattern */test/* EXCLUDE_PATTERNS = # The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names # (namespaces, classes, functions, etc.) that should be excluded from the # output. The symbol name can be a fully qualified name, a word, or if the # wildcard * is used, a substring. Examples: ANamespace, AClass, # AClass::ANamespace, ANamespace::*Test # # Note that the wildcards are matched against the file with absolute path, so to # exclude all test directories use the pattern */test/* EXCLUDE_SYMBOLS = # The EXAMPLE_PATH tag can be used to specify one or more files or directories # that contain example code fragments that are included (see the \include # command). EXAMPLE_PATH = # If the value of the EXAMPLE_PATH tag contains directories, you can use the # EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and # *.h) to filter out the source-files in the directories. If left blank all # files are included. EXAMPLE_PATTERNS = # If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be # searched for input files to be used with the \include or \dontinclude commands # irrespective of the value of the RECURSIVE tag. # The default value is: NO. EXAMPLE_RECURSIVE = NO # The IMAGE_PATH tag can be used to specify one or more files or directories # that contain images that are to be included in the documentation (see the # \image command). IMAGE_PATH = # The INPUT_FILTER tag can be used to specify a program that doxygen should # invoke to filter for each input file. Doxygen will invoke the filter program # by executing (via popen()) the command: # # # # where is the value of the INPUT_FILTER tag, and is the # name of an input file. Doxygen will then use the output that the filter # program writes to standard output. If FILTER_PATTERNS is specified, this tag # will be ignored. # # Note that the filter must not add or remove lines; it is applied before the # code is scanned, but not when the output code is generated. If lines are added # or removed, the anchors will not be placed correctly. INPUT_FILTER = # The FILTER_PATTERNS tag can be used to specify filters on a per file pattern # basis. Doxygen will compare the file name with each pattern and apply the # filter if there is a match. The filters are a list of the form: pattern=filter # (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how # filters are used. If the FILTER_PATTERNS tag is empty or if none of the # patterns match the file name, INPUT_FILTER is applied. FILTER_PATTERNS = # If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using # INPUT_FILTER ) will also be used to filter the input files that are used for # producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES). # The default value is: NO. FILTER_SOURCE_FILES = NO # The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file # pattern. A pattern will override the setting for FILTER_PATTERN (if any) and # it is also possible to disable source filtering for a specific pattern using # *.ext= (so without naming a filter). # This tag requires that the tag FILTER_SOURCE_FILES is set to YES. FILTER_SOURCE_PATTERNS = # If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that # is part of the input, its contents will be placed on the main page # (index.html). This can be useful if you have a project on for instance GitHub # and want to reuse the introduction page also for the doxygen output. USE_MDFILE_AS_MAINPAGE = #--------------------------------------------------------------------------- # Configuration options related to source browsing #--------------------------------------------------------------------------- # If the SOURCE_BROWSER tag is set to YES then a list of source files will be # generated. Documented entities will be cross-referenced with these sources. # # Note: To get rid of all source code in the generated output, make sure that # also VERBATIM_HEADERS is set to NO. # The default value is: NO. SOURCE_BROWSER = YES # Setting the INLINE_SOURCES tag to YES will include the body of functions, # classes and enums directly into the documentation. # The default value is: NO. INLINE_SOURCES = NO # Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any # special comment blocks from generated source code fragments. Normal C, C++ and # Fortran comments will always remain visible. # The default value is: YES. STRIP_CODE_COMMENTS = NO # If the REFERENCED_BY_RELATION tag is set to YES then for each documented # function all documented functions referencing it will be listed. # The default value is: NO. REFERENCED_BY_RELATION = NO # If the REFERENCES_RELATION tag is set to YES then for each documented function # all documented entities called/used by that function will be listed. # The default value is: NO. REFERENCES_RELATION = NO # If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set # to YES, then the hyperlinks from functions in REFERENCES_RELATION and # REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will # link to the documentation. # The default value is: YES. REFERENCES_LINK_SOURCE = YES # If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the # source code will show a tooltip with additional information such as prototype, # brief description and links to the definition and documentation. Since this # will make the HTML file larger and loading of large files a bit slower, you # can opt to disable this feature. # The default value is: YES. # This tag requires that the tag SOURCE_BROWSER is set to YES. SOURCE_TOOLTIPS = YES # If the USE_HTAGS tag is set to YES then the references to source code will # point to the HTML generated by the htags(1) tool instead of doxygen built-in # source browser. The htags tool is part of GNU's global source tagging system # (see http://www.gnu.org/software/global/global.html). You will need version # 4.8.6 or higher. # # To use it do the following: # - Install the latest version of global # - Enable SOURCE_BROWSER and USE_HTAGS in the config file # - Make sure the INPUT points to the root of the source tree # - Run doxygen as normal # # Doxygen will invoke htags (and that will in turn invoke gtags), so these # tools must be available from the command line (i.e. in the search path). # # The result: instead of the source browser generated by doxygen, the links to # source code will now point to the output of htags. # The default value is: NO. # This tag requires that the tag SOURCE_BROWSER is set to YES. USE_HTAGS = NO # If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a # verbatim copy of the header file for each class for which an include is # specified. Set to NO to disable this. # See also: Section \class. # The default value is: YES. VERBATIM_HEADERS = YES #--------------------------------------------------------------------------- # Configuration options related to the alphabetical class index #--------------------------------------------------------------------------- # If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all # compounds will be generated. Enable this if the project contains a lot of # classes, structs, unions or interfaces. # The default value is: YES. ALPHABETICAL_INDEX = YES # The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in # which the alphabetical index list will be split. # Minimum value: 1, maximum value: 20, default value: 5. # This tag requires that the tag ALPHABETICAL_INDEX is set to YES. COLS_IN_ALPHA_INDEX = 5 # In case all classes in a project start with a common prefix, all classes will # be put under the same header in the alphabetical index. The IGNORE_PREFIX tag # can be used to specify a prefix (or a list of prefixes) that should be ignored # while generating the index headers. # This tag requires that the tag ALPHABETICAL_INDEX is set to YES. IGNORE_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the HTML output #--------------------------------------------------------------------------- # If the GENERATE_HTML tag is set to YES doxygen will generate HTML output # The default value is: YES. GENERATE_HTML = YES # The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a # relative path is entered the value of OUTPUT_DIRECTORY will be put in front of # it. # The default directory is: html. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_OUTPUT = . # The HTML_FILE_EXTENSION tag can be used to specify the file extension for each # generated HTML page (for example: .htm, .php, .asp). # The default value is: .html. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_FILE_EXTENSION = .html # The HTML_HEADER tag can be used to specify a user-defined HTML header file for # each generated HTML page. If the tag is left blank doxygen will generate a # standard header. # # To get valid HTML the header file that includes any scripts and style sheets # that doxygen needs, which is dependent on the configuration options used (e.g. # the setting GENERATE_TREEVIEW). It is highly recommended to start with a # default header using # doxygen -w html new_header.html new_footer.html new_stylesheet.css # YourConfigFile # and then modify the file new_header.html. See also section "Doxygen usage" # for information on how to generate the default header that doxygen normally # uses. # Note: The header is subject to change so you typically have to regenerate the # default header when upgrading to a newer version of doxygen. For a description # of the possible markers and block names see the documentation. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_HEADER = # The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each # generated HTML page. If the tag is left blank doxygen will generate a standard # footer. See HTML_HEADER for more information on how to generate a default # footer and what special commands can be used inside the footer. See also # section "Doxygen usage" for information on how to generate the default footer # that doxygen normally uses. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_FOOTER = # The HTML_STYLESHEET tag can be used to specify a user-defined cascading style # sheet that is used by each HTML page. It can be used to fine-tune the look of # the HTML output. If left blank doxygen will generate a default style sheet. # See also section "Doxygen usage" for information on how to generate the style # sheet that doxygen normally uses. # Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as # it is more robust and this tag (HTML_STYLESHEET) will in the future become # obsolete. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_STYLESHEET = # The HTML_EXTRA_STYLESHEET tag can be used to specify an additional user- # defined cascading style sheet that is included after the standard style sheets # created by doxygen. Using this option one can overrule certain style aspects. # This is preferred over using HTML_STYLESHEET since it does not replace the # standard style sheet and is therefor more robust against future updates. # Doxygen will copy the style sheet file to the output directory. For an example # see the documentation. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_EXTRA_STYLESHEET = # The HTML_EXTRA_FILES tag can be used to specify one or more extra images or # other source files which should be copied to the HTML output directory. Note # that these files will be copied to the base HTML output directory. Use the # $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these # files. In the HTML_STYLESHEET file, use the file name only. Also note that the # files will be copied as-is; there are no commands or markers available. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_EXTRA_FILES = # The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen # will adjust the colors in the stylesheet and background images according to # this color. Hue is specified as an angle on a colorwheel, see # http://en.wikipedia.org/wiki/Hue for more information. For instance the value # 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300 # purple, and 360 is red again. # Minimum value: 0, maximum value: 359, default value: 220. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_COLORSTYLE_HUE = 220 # The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors # in the HTML output. For a value of 0 the output will use grayscales only. A # value of 255 will produce the most vivid colors. # Minimum value: 0, maximum value: 255, default value: 100. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_COLORSTYLE_SAT = 100 # The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the # luminance component of the colors in the HTML output. Values below 100 # gradually make the output lighter, whereas values above 100 make the output # darker. The value divided by 100 is the actual gamma applied, so 80 represents # a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not # change the gamma. # Minimum value: 40, maximum value: 240, default value: 80. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_COLORSTYLE_GAMMA = 80 # If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML # page will contain the date and time when the page was generated. Setting this # to NO can help when comparing the output of multiple runs. # The default value is: YES. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_TIMESTAMP = YES # If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML # documentation will contain sections that can be hidden and shown after the # page has loaded. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_DYNAMIC_SECTIONS = NO # With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries # shown in the various tree structured indices initially; the user can expand # and collapse entries dynamically later on. Doxygen will expand the tree to # such a level that at most the specified number of entries are visible (unless # a fully collapsed tree already exceeds this amount). So setting the number of # entries 1 will produce a full collapsed tree by default. 0 is a special value # representing an infinite number of entries and will result in a full expanded # tree by default. # Minimum value: 0, maximum value: 9999, default value: 100. # This tag requires that the tag GENERATE_HTML is set to YES. HTML_INDEX_NUM_ENTRIES = 100 # If the GENERATE_DOCSET tag is set to YES, additional index files will be # generated that can be used as input for Apple's Xcode 3 integrated development # environment (see: http://developer.apple.com/tools/xcode/), introduced with # OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a # Makefile in the HTML output directory. Running make will produce the docset in # that directory and running make install will install the docset in # ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at # startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html # for more information. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. GENERATE_DOCSET = NO # This tag determines the name of the docset feed. A documentation feed provides # an umbrella under which multiple documentation sets from a single provider # (such as a company or product suite) can be grouped. # The default value is: Doxygen generated docs. # This tag requires that the tag GENERATE_DOCSET is set to YES. DOCSET_FEEDNAME = "Doxygen generated docs" # This tag specifies a string that should uniquely identify the documentation # set bundle. This should be a reverse domain-name style string, e.g. # com.mycompany.MyDocSet. Doxygen will append .docset to the name. # The default value is: org.doxygen.Project. # This tag requires that the tag GENERATE_DOCSET is set to YES. DOCSET_BUNDLE_ID = org.doxygen.Project # The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify # the documentation publisher. This should be a reverse domain-name style # string, e.g. com.mycompany.MyDocSet.documentation. # The default value is: org.doxygen.Publisher. # This tag requires that the tag GENERATE_DOCSET is set to YES. DOCSET_PUBLISHER_ID = org.doxygen.Publisher # The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher. # The default value is: Publisher. # This tag requires that the tag GENERATE_DOCSET is set to YES. DOCSET_PUBLISHER_NAME = Publisher # If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three # additional HTML index files: index.hhp, index.hhc, and index.hhk. The # index.hhp is a project file that can be read by Microsoft's HTML Help Workshop # (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on # Windows. # # The HTML Help Workshop contains a compiler that can convert all HTML output # generated by doxygen into a single compiled HTML file (.chm). Compiled HTML # files are now used as the Windows 98 help format, and will replace the old # Windows help format (.hlp) on all Windows platforms in the future. Compressed # HTML files also contain an index, a table of contents, and you can search for # words in the documentation. The HTML workshop also contains a viewer for # compressed HTML files. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. GENERATE_HTMLHELP = NO # The CHM_FILE tag can be used to specify the file name of the resulting .chm # file. You can add a path in front of the file if the result should not be # written to the html output directory. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. CHM_FILE = # The HHC_LOCATION tag can be used to specify the location (absolute path # including file name) of the HTML help compiler ( hhc.exe). If non-empty # doxygen will try to run the HTML help compiler on the generated index.hhp. # The file has to be specified with full path. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. HHC_LOCATION = # The GENERATE_CHI flag controls if a separate .chi index file is generated ( # YES) or that it should be included in the master .chm file ( NO). # The default value is: NO. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. GENERATE_CHI = NO # The CHM_INDEX_ENCODING is used to encode HtmlHelp index ( hhk), content ( hhc) # and project file content. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. CHM_INDEX_ENCODING = # The BINARY_TOC flag controls whether a binary table of contents is generated ( # YES) or a normal table of contents ( NO) in the .chm file. # The default value is: NO. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. BINARY_TOC = NO # The TOC_EXPAND flag can be set to YES to add extra items for group members to # the table of contents of the HTML help documentation and to the tree view. # The default value is: NO. # This tag requires that the tag GENERATE_HTMLHELP is set to YES. TOC_EXPAND = NO # If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and # QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that # can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help # (.qch) of the generated HTML documentation. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. GENERATE_QHP = NO # If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify # the file name of the resulting .qch file. The path specified is relative to # the HTML output folder. # This tag requires that the tag GENERATE_QHP is set to YES. QCH_FILE = # The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help # Project output. For more information please see Qt Help Project / Namespace # (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace). # The default value is: org.doxygen.Project. # This tag requires that the tag GENERATE_QHP is set to YES. QHP_NAMESPACE = org.doxygen.Project # The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt # Help Project output. For more information please see Qt Help Project / Virtual # Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual- # folders). # The default value is: doc. # This tag requires that the tag GENERATE_QHP is set to YES. QHP_VIRTUAL_FOLDER = doc # If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom # filter to add. For more information please see Qt Help Project / Custom # Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- # filters). # This tag requires that the tag GENERATE_QHP is set to YES. QHP_CUST_FILTER_NAME = # The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the # custom filter to add. For more information please see Qt Help Project / Custom # Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- # filters). # This tag requires that the tag GENERATE_QHP is set to YES. QHP_CUST_FILTER_ATTRS = # The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this # project's filter section matches. Qt Help Project / Filter Attributes (see: # http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes). # This tag requires that the tag GENERATE_QHP is set to YES. QHP_SECT_FILTER_ATTRS = # The QHG_LOCATION tag can be used to specify the location of Qt's # qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the # generated .qhp file. # This tag requires that the tag GENERATE_QHP is set to YES. QHG_LOCATION = # If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be # generated, together with the HTML files, they form an Eclipse help plugin. To # install this plugin and make it available under the help contents menu in # Eclipse, the contents of the directory containing the HTML and XML files needs # to be copied into the plugins directory of eclipse. The name of the directory # within the plugins directory should be the same as the ECLIPSE_DOC_ID value. # After copying Eclipse needs to be restarted before the help appears. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. GENERATE_ECLIPSEHELP = NO # A unique identifier for the Eclipse help plugin. When installing the plugin # the directory name containing the HTML and XML files should also have this # name. Each documentation set should have its own identifier. # The default value is: org.doxygen.Project. # This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES. ECLIPSE_DOC_ID = org.doxygen.Project # If you want full control over the layout of the generated HTML pages it might # be necessary to disable the index and replace it with your own. The # DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top # of each HTML page. A value of NO enables the index and the value YES disables # it. Since the tabs in the index contain the same information as the navigation # tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. DISABLE_INDEX = NO # The GENERATE_TREEVIEW tag is used to specify whether a tree-like index # structure should be generated to display hierarchical information. If the tag # value is set to YES, a side panel will be generated containing a tree-like # index structure (just like the one that is generated for HTML Help). For this # to work a browser that supports JavaScript, DHTML, CSS and frames is required # (i.e. any modern browser). Windows users are probably better off using the # HTML help feature. Via custom stylesheets (see HTML_EXTRA_STYLESHEET) one can # further fine-tune the look of the index. As an example, the default style # sheet generated by doxygen has an example that shows how to put an image at # the root of the tree instead of the PROJECT_NAME. Since the tree basically has # the same information as the tab index, you could consider setting # DISABLE_INDEX to YES when enabling this option. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. GENERATE_TREEVIEW = NO # The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that # doxygen will group on one line in the generated HTML documentation. # # Note that a value of 0 will completely suppress the enum values from appearing # in the overview section. # Minimum value: 0, maximum value: 20, default value: 4. # This tag requires that the tag GENERATE_HTML is set to YES. ENUM_VALUES_PER_LINE = 4 # If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used # to set the initial width (in pixels) of the frame in which the tree is shown. # Minimum value: 0, maximum value: 1500, default value: 250. # This tag requires that the tag GENERATE_HTML is set to YES. TREEVIEW_WIDTH = 250 # When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open links to # external symbols imported via tag files in a separate window. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. EXT_LINKS_IN_WINDOW = NO # Use this tag to change the font size of LaTeX formulas included as images in # the HTML documentation. When you change the font size after a successful # doxygen run you need to manually remove any form_*.png images from the HTML # output directory to force them to be regenerated. # Minimum value: 8, maximum value: 50, default value: 10. # This tag requires that the tag GENERATE_HTML is set to YES. FORMULA_FONTSIZE = 10 # Use the FORMULA_TRANPARENT tag to determine whether or not the images # generated for formulas are transparent PNGs. Transparent PNGs are not # supported properly for IE 6.0, but are supported on all modern browsers. # # Note that when changing this option you need to delete any form_*.png files in # the HTML output directory before the changes have effect. # The default value is: YES. # This tag requires that the tag GENERATE_HTML is set to YES. FORMULA_TRANSPARENT = YES # Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see # http://www.mathjax.org) which uses client side Javascript for the rendering # instead of using prerendered bitmaps. Use this if you do not have LaTeX # installed or if you want to formulas look prettier in the HTML output. When # enabled you may also need to install MathJax separately and configure the path # to it using the MATHJAX_RELPATH option. # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. USE_MATHJAX = NO # When MathJax is enabled you can set the default output format to be used for # the MathJax output. See the MathJax site (see: # http://docs.mathjax.org/en/latest/output.html) for more details. # Possible values are: HTML-CSS (which is slower, but has the best # compatibility), NativeMML (i.e. MathML) and SVG. # The default value is: HTML-CSS. # This tag requires that the tag USE_MATHJAX is set to YES. MATHJAX_FORMAT = HTML-CSS # When MathJax is enabled you need to specify the location relative to the HTML # output directory using the MATHJAX_RELPATH option. The destination directory # should contain the MathJax.js script. For instance, if the mathjax directory # is located at the same level as the HTML output directory, then # MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax # Content Delivery Network so you can quickly see the result without installing # MathJax. However, it is strongly recommended to install a local copy of # MathJax from http://www.mathjax.org before deployment. # The default value is: http://cdn.mathjax.org/mathjax/latest. # This tag requires that the tag USE_MATHJAX is set to YES. MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest # The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax # extension names that should be enabled during MathJax rendering. For example # MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols # This tag requires that the tag USE_MATHJAX is set to YES. MATHJAX_EXTENSIONS = # The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces # of code that will be used on startup of the MathJax code. See the MathJax site # (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an # example see the documentation. # This tag requires that the tag USE_MATHJAX is set to YES. MATHJAX_CODEFILE = # When the SEARCHENGINE tag is enabled doxygen will generate a search box for # the HTML output. The underlying search engine uses javascript and DHTML and # should work on any modern browser. Note that when using HTML help # (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET) # there is already a search function so this one should typically be disabled. # For large projects the javascript based search engine can be slow, then # enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to # search using the keyboard; to jump to the search box use + S # (what the is depends on the OS and browser, but it is typically # , /